SOM (Set-Of-Mark)
Enhance your AI's image understanding with spatial and speakable marks, improving accuracy and visual grounding abilities.
About the product
GPT4V has been great, however the performance of certain vision tasks has been unpredictable, especially for object counting & recognition;
Microsoft released a technique called SoM, which simply overlaying images with easy-to-understand marks (like numbers or letters), which turns GPT-4V into a vision pro.
Try demo: https://som-gpt4v.github.io/
Github repo: https://github.com/microsoft/SoM
arXiv paper: https://arxiv.org/abs/2310.11441