Abaka Pulse : Latest Insights in AI & Data | April 26 - May 12

Abaka Pulse is a biweekly newsletter published by Abaka AI, focusing on the latest trends in data engineering, open-source dataset strategies, and industry benchmarks.

Every two weeks, we explore what’s shaping the future of intelligent data infrastructure – from milestone updates on our MooreData Platform and full-stack dataset engineering services, to the open-source frontier of 2077AI, to standout research and movements across the global AI community.

1. At the Core | What We're Exploring

The Evolution of AI Model Evaluation: Beyond Traditional Metrics

The rapid advancement of AI capabilities has sparked a critical discussion about how we evaluate large language models and AI systems. Traditional benchmarks are increasingly insufficient for capturing the nuanced capabilities of modern AI systems.
Our SuperGPQA initiative directly addresses this challenge by introducing a novel evaluation framework across 285 graduate disciplines, while our COIG-P dataset represents a significant step forward in measuring and improving Chinese language model alignment with human preferences.

2. Latest Insights | Knowledge, Releases, Ideas

2077AI Blog

As an Organization Contributor to 2077AI Foundation, Abaka AI collaborates closely on advancing global AI development through our MooreData Platform and international network spanning Silicon Valley, Singapore, Paris, and Tokyo.

Learn more about our research initiatives at 2077AI.

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines
- A groundbreaking evaluation framework exploring LLM capabilities across diverse academic disciplines.
- Essential reading for AI researchers and engineers working on model evaluation and benchmarking.
OmniHD-Scenes: Next-Generation Multimodal Dataset for Autonomous Driving
- A collaborative achievement introducing comprehensive, high-definition datasets for autonomous driving research.
- Features unprecedented scene diversity and modern data perspectives for automotive AI development.

Recent Publications

FormalMATH: Advancing Mathematical Reasoning in LLMs
- Large-scale Lean4 benchmark with 5,560 formally verified problems
- Novel human-in-the-loop autoformalization pipeline
- Critical insights into LLM mathematical reasoning capabilities
COIG-P: Pioneering Chinese Language Model Alignment
- Comprehensive Chinese preference dataset with 1M+ pairs
- Spans 6 diverse domains: Chat, Code, Math, Logic, Novel, and Role
- Demonstrates significant performance improvements in Chinese LLMs

3. Powered by Molar | PCAT 3.0

MooreData Platform has released Point Cloud Annotation Tool (PCAT) 3.0. This update enhances point cloud data annotation efficiency and accuracy, delivering comprehensive user experience improvements, including:

Large-scale Point Cloud Processing
- Supports billion-level 4D global map data
- Enables smooth annotation of massive datasets
4D Annotation Assistance
- Enhanced snapping logic
- Precise lane line adherence to ground point clouds
Enhanced Interface & Tools
- Streamlined UI for faster operations
- Refined annotation toolkit for higher precision
Attribute Tracking
- Efficient object attribute monitoring across consecutive frames
- Intuitive visualization of attribute changes

These updates significantly boost annotation productivity and accuracy, providing users with smoother operations and enhanced functionality. Experience the new features in PCAT 3.0 now available on our MooreData Platform.

4. On the Ground | Where We Are & Who We're Talking To

ICLR 2025: Bridging AI Innovation in Singapore

From April 24-28, we joined the global AI community at ICLR 2025 in Singapore, at booth #H06, we showcased our latest intelligent data engineering solutions, engaging with researchers and industry leaders from around the world.

Upcoming Event: ICRA 2025

We're excited to announce our participation in the upcoming IEEE International Conference on Robotics and Automation (ICRA 2025), taking place May 20-23.

Learn more about ICRA 2025 →

Detailed information about our conference activities will be shared soon.

5. On Our Radar | What We’re Reading

In this issue, we focus on two emerging research directions that are reshaping AI evaluation methodologies: formal mathematical reasoning and physics perception assessment.

Formal Problem-Solving: Beyond Theorem Proving

Pioneering framework redefining formal mathematics

SJTU-ReThinkLab's FPS framework transcends traditional theorem proving, enabling end-to-end formalization from problem comprehension to solution verification. This research complements our FormalMATH project while opening new paths for advancing AI's mathematical reasoning capabilities.

PHYBench: Revolutionizing Physics Reasoning Evaluation

Innovative assessment methodology for AI physics intelligence

Peking University's PHYBench introduces the groundbreaking EED Score mechanism, enabling granular evaluation of AI models' physics problem-solving abilities. Its curated set of 500 problems reveals significant gaps in current top models' physical perception and reasoning, providing valuable insights for our benchmark development.

These studies align with our ongoing exploration in AI evaluation and formal reasoning, charting the course for next-generation AI systems.

Stay Tuned with Abaka, see you next Pulse!