The Most Comprehensive Sharing for Reasoning Dataset: CoT - Related Datasets

Reasoning Datasets and Chain-of-Thought

Reasoning datasets are a category of datasets specifically designed to train and evaluate the reasoning capabilities of models. They typically involve complex tasks such as logical reasoning, commonsense reasoning, mathematical reasoning, and causal reasoning, helping models handle multi-step reasoning problems and complex reasoning scenarios. With the development of large language models (LLMs) and reasoning methods like Chain-of-Thought (CoT), the importance of reasoning tasks in natural language processing (NLP) has grown significantly.

Chain-of-Thought (CoT) is a strategy used in the field of natural language processing (NLP) for reasoning, particularly widely applied in large language models like GPT. The core idea of CoT is to simulate the step-by-step reasoning process of human thinking by breaking down problems, thereby helping models better understand complex tasks and provide more accurate answers. The key concept of the CoT method is to enable the model to generate an ordered chain of reasoning rather than jumping directly to conclusions. Specifically, the CoT method decomposes complex tasks into a series of subtasks or intermediate steps, with each step providing more detailed reasoning information to help the model derive the correct final conclusion through reasoning. By reasoning step-by-step, the CoT approach not only improves the accuracy of solving complex problems but also enhances the interpretability of models.

To advance the development of the CoT method, researchers and developers have created multiple open-source datasets specifically designed to evaluate and train models' reasoning capabilities, particularly for tasks involving multi-step reasoning and complex problems. In the first installment of the Reasoning Dataset Sharing Series, we have compiled open-source reasoning datasets that incorporate the CoT approach. These datasets cover various domains, from commonsense reasoning to mathematical reasoning, and from situational reasoning to paragraph comprehension.

NuminaMath-CoT

Publisher: AI-MO
Download Address: https://projectnumina.ai/
Release Year: 2023
Size: Approximately 1GB (contains thousands of math problems with step-by-step reasoning processes)
Description: NuminaMath-CoT is a mathematical reasoning dataset specifically designed to evaluate the reasoning capabilities of large-scale language models. Each math problem in the dataset includes a step-by-step reasoning process (Chain-of-Thought, CoT), helping models maintain high accuracy when solving complex mathematical problems. This dataset is not only suitable for basic arithmetic problems but also includes more advanced topics such as algebra, geometry, and number theory. The CoT method encourages models to clearly demonstrate problem-solving approaches through multi-step logical reasoning, thereby enhancing computational and reasoning abilities.

LLaVA-CoT-100k

Publisher: PKU-YUAN-Lab
Download Address: https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k
Release Year: 2023
Size: Approximately 10GB (contains 100,000 multi-step reasoning tasks)
Description: LLaVA-CoT-100k is a dataset containing 100,000 multi-step reasoning tasks, designed to enhance the reasoning capabilities of large language models (LLMs) in visual and language tasks. Each problem requires the model to extract key information from visual inputs and combine it with textual reasoning to derive the answer step-by-step. This dataset particularly focuses on reasoning tasks assisted by visual inputs, making it suitable for multimodal reasoning training of models.

CoT-Collection

Publisher: kaist-ai
Download Address: https://huggingface.co/datasets/kaist-ai/CoT-Collection
Release Year: 2023
Size: Approximately 3GB (contains various types of reasoning tasks)
Description: CoT-Collection is a diverse reasoning task dataset covering a wide range of fields from mathematics to logical reasoning. The dataset provides detailed reasoning processes for each problem and requires models to demonstrate complete reasoning chains during problem-solving. CoT-Collection aims to train models to handle complex reasoning problems, testing not only their computational abilities but also challenging their logical and abstract thinking skills.

cot_flan

Publisher: causal-lm
Download Address: https://huggingface.co/datasets/causal-lm/cot_flan
Release Year: 2023
Size: Approximately 3GB (contains a large number of reasoning tasks, suitable for language models)
Description: cot_flan is a dataset optimized for language models, focusing on enhancing the reasoning and inference capabilities of language models through the Chain-of-Thought (CoT) method. The tasks in the dataset span multiple domains, including reasoning, logical reasoning, and mathematical reasoning. Each task requires the model to provide detailed reasoning steps to help it better understand complex input data and generate high-quality outputs.

GSM8K (Grade-School Math 8K)

Publisher: OpenAI
Download Address: https://github.com/openai/grade-school-math?tab=readme-ov-file
Release Year: 2022
Size: Approximately 2GB (contains over 8,000 math problems and their solution steps)
Description: GSM8K is a dataset containing over 8,000 math problems, primarily targeting elementary school-level math questions. Each problem includes detailed solution steps and requires the model to derive the correct answer step-by-step. The application of the Chain-of-Thought (CoT) concept in this dataset involves step-by-step reasoning, ensuring that the model not only provides the final answer but also demonstrates each reasoning step. By explicitly showcasing the reasoning process, the CoT method helps models solve complex arithmetic, algebra, and geometry problems, improving their reasoning and computational accuracy.

The Example of GSM8K

cot_gsm8k

Publisher: Dahoas
Download Address: https://huggingface.co/datasets/Dahoas/cot_gsm8k
Release Year: 2023
Size: Approximately 2.5GB (contains over 8,000 math problems)
Description: cot_gsm8k is an extended version of the GSM8K dataset, focusing on enhancing the mathematical reasoning capabilities of models through the Chain-of-Thought (CoT) method. The dataset includes a variety of math problems, covering content from basic arithmetic to advanced algebra and geometry. Each problem contains step-by-step reasoning processes, emphasizing the reasoning chain of the model when solving problems. This dataset is particularly suitable for training and evaluating AI systems with reasoning capabilities, especially for elementary and middle school-level math education scenarios.

MATH (Mathematics Dataset)

Publisher: Hendrycks et al.
Download Address: MATH GitHub
Release Year: 2021
Size: Approximately 1GB (contains multiple complex math problems)
Description: The MATH dataset includes math problems of varying difficulty levels, covering content from simple arithmetic to advanced mathematics. Each problem requires multi-step reasoning to arrive at the correct answer. The application of Chain-of-Thought (CoT) in this dataset is primarily reflected in solving complex algebra, geometry, probability, and other math problems through step-by-step reasoning. The CoT method helps models decompose and analyze problems incrementally, avoiding direct answers and enhancing the transparency of problem-solving.

The Example of MATH

CommonsenseQA

Publisher: Microsoft Research
Download Address: https://www.tau-nlp.org/commonsenseqa
Release Year: 2019
Size: Approximately 1GB (contains 12,247 questions)
Description: CommonsenseQA is a commonsense reasoning dataset that includes a large number of multiple-choice questions, where the answers require reasoning based on commonsense knowledge. The application of the Chain-of-Thought (CoT) concept in this dataset is reflected in breaking down questions into multiple reasoning steps, helping the model generate logical reasoning chains to select the most commonsense answer. This step-by-step reasoning approach enables models to handle more complex reasoning problems, especially in situations where explicit context is lacking.

The Example of CommonsenseQA

SWAG (Situations With Adversarial Generations)

Publisher: Facebook AI Research
Download Address: https://rowanzellers.com/swag/
Release Year: 2018
Size: Approximately 2GB (contains 113k situational questions)
Description: The SWAG dataset contains approximately 113,000 multiple-choice questions based on everyday situations, requiring the model to infer the most likely subsequent event. The application of Chain-of-Thought (CoT) in this dataset is reflected in the model's ability to decompose situational information into multiple steps through step-by-step reasoning, thereby deriving reasonable subsequent actions. Through CoT, the model can better understand the underlying relationships in the context and select the most commonsense answer.

The Example of SWAG

DROP (Discrete Reasoning Over Paragraphs)

Publisher: Facebook AI Research
Download Address: https://github.com/allenai/allennlp-reading-comprehension/blob/master/allennlp_rc/eval/drop_eval.py
Release Year: 2019
Size: Approximately 2GB (contains over 7,000 passage-based questions)
Description: The DROP dataset focuses on paragraph-level reasoning tasks, with questions typically involving discrete reasoning such as addition, subtraction, and summation. The application of the Chain-of-Thought (CoT) concept in this dataset is demonstrated through multi-step reasoning and information extraction, enabling the model to extract necessary information from the passage and perform step-by-step reasoning calculations to arrive at the final answer. This step-by-step reasoning approach helps the model achieve higher accuracy in understanding text and performing complex mathematical reasoning.

The Example of DROP

ReClor (Reasoning with Commonsense Logic)

Publisher: Tsinghua University
Download Address: https://whyu.me/reclor/
Release Year: 2020
Size: Approximately 1GB (contains over 9,000 reasoning questions)
Description: The ReClor dataset includes various types of commonsense and logical reasoning questions, requiring models to answer based on commonsense knowledge and logical reasoning. The application of the Chain-of-Thought (CoT) method in this dataset involves breaking down each problem into multiple reasoning steps, enabling the model to derive the correct answer through a more systematic and structured reasoning path. The step-by-step reasoning approach of CoT in the ReClor dataset makes complex reasoning problems easier to understand and solve.

The Example of ReClor

AQUA-RAT (AQUA Reasoning and Answering Task)

Publisher: Facebook AI Research
Download Address: https://github.com/google-deepmind/AQuA
Release Year: 2021
Size: Approximately 500MB (contains about 10,000 reasoning questions)
Description: The AQUA-RAT dataset includes open-ended questions that require commonsense reasoning and step-by-step reasoning. The application of Chain-of-Thought (CoT) in this dataset is reflected in helping the model derive answers through multi-step reasoning. Each problem requires the model to decompose the reasoning process, thereby drawing reasonable conclusions based on the provided information and commonsense knowledge. The use of the CoT method enables the model to handle more complex reasoning tasks, improving the accuracy and interpretability of the answers.

The Example of AQUA-RAT

Conclusion

In this installment of the Reasoning Dataset Sharing Series, we have focused on introducing diverse datasets based on the Chain-of-Thought (CoT) reasoning method. The Most Comprehensive Reasoning Dataset Sharing Series aims to provide researchers and developers with a rich collection of open-source datasets. In the future, we will continue to publish more articles on reasoning datasets, exploring more challenging reasoning tasks and helping everyone better understand and apply these datasets.