Abaka Pulse : Latest Insights in AI & Data | May 10-May 26

1. At the Core | What We're Exploring

Audio-Language Intelligence: From Perception to Reasoning

The rise of Audio-Language Models (ALMs) signals a paradigm shift in how machines interpret multimodal real-world inputs. While speech and music understanding have long dominated the scene, recent work extends this boundary to a deeper, more reasoning-centric level.
That’s the direction 2077AI is taking — pushing towards a new frontier where audio inputs are not just processed, but truly understood through multi-layered inference and contextual depth.

2. Latest Insights | Knowledge, Releases, Ideas

2077AI: Recent Publications

As an Organization Contributor to 2077AI Foundation, Abaka AI collaborates closely on advancing global AI development through our MooreData Platform and international network spanning Silicon Valley, Singapore, Paris, and Tokyo.

Learn more about our research initiatives at 2077AI.

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

This new benchmark for Deep ALM Reasoning evaluates Audio-Language Models' deep reasoning across diverse tasks, using 1,000 high-quality, mixed-modality audio-question-answer triplets from real-world videos. Its hierarchical questions with Chain-of-Thought rationales effectively reveal current ALM limitations, particularly in graduate-level perceptual and domain-specific understanding.

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

We introduces OmniDocBench, a new benchmark for evaluating document content extraction across various PDF types, including challenging cases like handwritten notes. It provides a comprehensive framework to assess the strengths and weaknesses of current document parsing methods.

Additionally, we are proud to present our research poster at CVPR 2025. Pay a visit to our poster for in-depth discussions!

3. On the Ground | Where We Are & Who We're Talking To

ICRA 2025

Last week, our team attended ICRA 2025, where we connected with researchers and engineers pushing the boundaries of embodied AI and robotics.

Together with Dexmate and RoboForce, we co-hosted a Happy Hour Mixer on May 21. With Georgian cuisine, industry insiders, and real conversations about the future of robotics + datasets, it was a night to remember.

Upcoming event: CVPR 2025

We are excited to announce our participation in The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025, the premier annual computer vision event. We will be exhibiting at Booth #1535. We warmly welcome all interested friends to visit our booth for engaging discussions and to learn more about our pioneering work in intelligent data infrastructure.

Also, Abaka AI is sponsoring several workshops at CVPR 2025
- AI for Creative Visual Content Generation, Editing and Understanding (CVEU): This workshop brings together experts in computer graphics, HCI, computer vision, and machine learning to explore advancements in AI for assisted creative visual content.
- DriveX: Foundation Models and V2X for Cooperative Driving: This workshop focuses on integrating foundation models and V2X systems to enhance perception, planning, and decision-making in autonomous vehicles, aiming to advance road safety.

We welcome all interested friends to join us for discussions!

Please continue to follow our official website and LinkedIn (Abaka AI) for the latest updates on our sponsored workshops and our hosted afterparty.

4. On Our Radar | What We’re Reading

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

A dynamic evaluation platform with over fifty textual or visual games, designed to comprehensively assess LLM reasoning. It supports interactive, multi-turn assessments, including reinforcement learning scenarios, to reveal consistent reasoning patterns and evaluate model performance across various factors like modality and response length.

SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents A new multi-language benchmark for repository-level, execution-based evaluation of LLM-powered coding agents. It features 2110 instances across 21 repositories, covering tasks in Java, JavaScript, TypeScript, and Python, along with novel syntax tree analysis metrics to reveal agents' strengths and limitations across languages and task complexities.
A Survey on Test-Time Scaling in Large Language Models: What, How, Where, and How Well?

Provides a comprehensive understanding of Test-Time Scaling (TTS) in LLMs, a prominent research focus for eliciting problem-solving capabilities in various tasks. It proposes a unified framework structured along four core dimensions (what, how, where, how well to scale) and offers insights into developmental trajectories, practical deployment guidelines, and future research directions for TTS.

Stay Tuned with Abaka Pulse!

Missed an issue? Catch up anytime in our Newsletter Archive.

See you next pulse!