Abaka Pulse : Latest Insights in AI & Data | July 19-Aug 1
At the Core | What We're Exploring
- Advancing AI Agents: The Data Foundation
The shift to AI agents is hampered by a lack of high-quality, diverse agent datasets. Existing methods often miss crucial tool interactions or require expensive human annotation. Abaka AI directly solves this, offering scalable, premium agent datasets and related services. We provide the essential data backbone for fine-tuning and evaluating agent models, speeding up the creation of truly intelligent, autonomous AI systems.
- Elevating AI with Robust Coding Datasets
The need for smart code generation and understanding is exploding. This relies on strong coding datasets that capture diverse languages, syntax, semantics, and real-world logic. Abaka AI specializes in crafting these vital datasets, empowering AI to produce more accurate code, grasp complex programs, and streamline development. Our comprehensive services build and refine your coding datasets, ensuring your AI trains on optimal data.
- Ready to unlock your AI's full potential?
Book a demo to see how Abaka AI can help you build your next-gen AI solutions, or contact us to speak with our expert team for a free data consultation.
Latest Insights | Knowledge, Releases, Ideas
Developed with valuable input from 2077AI Open Source Foundation.
- VeriGUI: Verifiable Long-Chain GUI Dataset - Paper Coming Soon!
We're thrilled to share the release of VeriGUI, a groundbreaking dataset from 2077AI, which Abaka AI is proud to have contributed to. This dataset aims to advance general-purpose GUI agents. Unlike previous efforts focusing on short interactions, VeriGUI tackles long-chain complexity, featuring tasks with hundreds of interdependent operations that truly mirror real-world workflows. What truly sets it apart is step-by-step verifiability, allowing agents to explore and iterate at each stage. This expertly-annotated dataset, spanning both desktop and web GUI tasks, is crucial for developing robust planning and decision-making in GUI agents.
Check out the details and dive into the dataset on GitHub and HuggingFace.

In a significant stride for automated theorem proving, we're excited about CriticLean, an automated pipeline from 2077AI that translates natural language mathematical statements into formally verified Lean 4 code. Abaka AI provided crucial data contributions to this work. This framework, powered by the CriticLeanGPT Model and leveraging compiler feedback, dramatically improves formalization accuracy (84% vs. 54% compiler-only). CriticLean introduces CriticLeanBench for rigorous evaluation and releases FineLeanCorpus, a massive dataset of 285K verified Lean 4 formalizations, including a challenging 36K Diamond subset. This innovation is pivotal for reliable formal mathematical reasoning.
Check out the details and dive into the dataset on GitHub and HuggingFace.

- Community Spotlight|Join Our 2077AI Community!
2077AI Open-source Foundation, backed by Abaka AI, has launched its Discord community! We invite all AI professionals, researchers, academics, and industry practitioners dedicated to the open-source AI ecosystem to join us. Let's discuss academic frontiers and collectively build a thriving AI environment. You can also connect with us and the community here:
Developed with valuable input from 2077AI Open Source Foundation. At Abaka AI, we possess the expertise and resources to build world-leading, academically cutting-edge datasets. We look forward to partnering with you to push the boundaries of AI technology.
On the Ground | Where We Are & Who We're Talking To
- ONGOING: Catch Us at ACL!
If you're currently attending ACL 2025 right now, don't miss the chance to connect with us! Our Head of Abaka European Team, Omid Gholamzadeh Nasrabadi, is on-site and ready to discuss how Abaka AI can support your projects. Feel free to reach out to him directly at omid@abaka.ai to arrange an in-person meeting and chat. We'd love to connect!
- PAST EVENT: Abaka AI at ICML 2025: Connecting Data & AI Agents
We're fresh off an incredible week at ICML 2025 in Vancouver! We were thrilled to actively participate and host an exclusive Abaka AI x 1943 Community @ ICML: Where Data Meets AI Agents after-party. It was fantastic to connect with so many industry leaders, engineers, researchers, and innovators during our mixer. Thanks to everyone who joined us to network, share insights, and explore the future of AI!
On Our Radar | What We’re Reading
ByteDance Seed AI4Math Team introduces Seed-Prover, a breakthrough in automated theorem proving that can solve IMO-level mathematical problems. It leverages formal verification and reinforcement learning, proving 78.1% of past IMO problems. This aligns perfectly with Abaka's focus on high-level mathematical datasets.
A latest research presenting RRVF, a framework enabling Multimodal LLMs to learn complex visual reasoning solely from raw images, reducing the need for extensive text supervision. By using visual feedback for self-correction, it offers a new paradigm for training robust models, highly relevant to our work in agents and coding.