SOM (Set-Of-Mark)

Enhance your AI's image understanding with spatial and speakable marks, improving accuracy and visual grounding abilities.

Visit

About the product

GPT4V has been great, however the performance of certain vision tasks has been unpredictable, especially for object counting & recognition;

Microsoft released a technique called SoM, which simply overlaying images with easy-to-understand marks (like numbers or letters), which turns GPT-4V into a vision pro.

Try demo: https://som-gpt4v.github.io/

Github repo: https://github.com/microsoft/SoM

arXiv paper: https://arxiv.org/abs/2310.11441

Submit your product!

Lorem ipsum dolor amet lorem non consectetur adipiscing.

Submit now
More products
Replicate
Replicate
Deploy LLM
Airkit
Airkit
Build your own agent
Merge dev
Merge dev
Integration & data loader
Submit your product!

Lorem ipsum dolor amet lorem non consectetur adipiscing.

Submit now