#2572025-07-03
dots.ocr: A 1.7B Vision-Language Model That Beats GPT-4o at Document Parsing
rednote-hilab's dots.ocr packs SOTA OmniDocBench performance into a 1.7B-parameter VLM, outperforming Qwen2-VL-72B and GPT-4o on key OCR benchmarks while running on a single GPU.