Google‘s Gemini 3 introduces a new era of "Deep Think" reasoning and agentic workflows, shattering benchmarks like GPQA Diamond and ARC-AGI-2. Crucially for enterprise applications, it also demonstrates exceptional document parsing capabilities on the 2077AI Open Source Foundation’s OmniDocBench, signaling a major leap forward for complex data processing. While the model sets a new standard, deploying it effectively requires the rigorous evaluation and high-quality data infrastructure that Abaka AI provides.
Abaka AI‘s OmniDocBench Standardizes Gemini 3’s Document Intelligence

Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI
The verdict is in: The world's smartest AI models are now being graded on data built by Abaka AI.

Yesterday, the AI world shifted with Google DeepMind’s release of Gemini 3, their most capable multimodal model to date. While the headlines focus on its reasoning capabilities, a deeper look into their technical report reveals a critical detail for the data industry:
OmniDocBench 1.5, a benchmark co-developed by 2077AI with contribution of Abaka AI, was selected as the core standard to evaluate Gemini 3’s Optical Character Recognition (OCR) and document understanding performance.
This follows closely on the heels of DeepSeek-OCR citing the same benchmark. The pattern is undeniable: when top-tier labs need to prove their models can handle the messy, complex reality of the visual world, they turn to OmniDocBench.
Gemini 3's Performance: A Leap Forward
According to Google’s report, Gemini 3 Pro achieved an Edit Distance of 0.115 on OmniDocBench 1.5 (lower is better).
This score isn't just a number; it represents a new State-of-the-Art (SOTA), outperforming formidable competitors like GPT-5.1 (0.147) and Claude Sonnet 4.5 (0.145).

Gemini 3 Pro sets a new record on OmniDocBench 1.5, validating its superior document processing capabilities against the industry's toughest test.
The Engine Behind the Benchmark: Abaka AI's Data Pipeline
Why has OmniDocBench become the de-facto industry standard so quickly? Because it was built to be unbreakable by simple models.
Building a benchmark of this caliber isn't just about collecting PDFs; it’s about extreme Data Engineering. This is where Abaka AI steps in. As the core data partner for 2077AI, our advanced data construction pipeline was the engine that powered OmniDocBench’s creation.

To challenge models like Gemini 3 and GPT-5, we couldn't rely on clean, digital-born academic papers. We had to engineer complexity.
1. Engineering Diversity
We constructed a dataset spanning 9 distinct document types, including notoriously difficult formats like:
- Handwritten Notes: Testing the limits of vision-text alignment.
- Multi-Column Newspapers: Challenging layout analysis and reading order logic.
- Financial Reports: Requiring precise extraction from dense, borderless tables.
2. Engineering Granularity
Standard OCR datasets give a "pass/fail." Abaka AI’s pipeline annotated 19 layout categories and 15 attribute labels for every single page. This granular labeling allows researchers at Google and DeepSeek to diagnose exactly why a model fails—whether it's a rotated table header or a complex mathematical formula.
3. Engineering Precision
Our "Human-in-the-Loop" pipeline ensures ground-truth accuracy that meets the rigorous standards of top research labs. When Google measures an Edit Distance difference of 0.002, they need to know that the benchmark itself is pixel-perfect. Abaka AI delivered that precision.
Great AI Starts with Great Data
The adoption of OmniDocBench by Google DeepMind validates a core truth of the Generative AI era: Model architecture is converging; data is the differentiator.
Whether you are training the next GPT-5 or fine-tuning a specialized vertical model, the ceiling of your performance is defined by the quality and complexity of your data.
At Abaka AI, we don't just build datasets; we build rulers that measure intelligence. If our data pipelines can challenge Gemini 3, imagine what they can do for your models.
🚀 Explore the Industry Standard
See the benchmark that Google and DeepSeek are using to test their frontiers.
- 🌐 Visit OmniDocBench Homepage: Click Here to Explore the Data
- 📄 Read the Technical Deep Dive: The Science Behind the Benchmark
- 🤝 Partner with Abaka AI: Contact Us for Custom Data Solutions

