# Unlocking AI's Potential: MCTSr Enhances Mathematical Reasoning

Original Title

## Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

- Cornell University

## Introduction to Large Language Models and Mathematical Reasoning

In recent years, the rapid advancement of artificial intelligence has led to the development of

## Integrating MCTS and LLMs for Improved Mathematical Reasoning

To address the challenges that LLMs face in complex mathematical reasoning tasks, researchers have proposed integrating them with a decision-making algorithm called

The researchers have developed a novel algorithm called

### The Self-Refine Process

The self-refine process involves an iterative refinement of an answer to a problem. The model first generates a reflective or critical comment about the initial answer, and then uses that feedback to produce an improved version of the answer.

### Self-Evaluation and Q-values

The key concept in the self-evaluation process is the

### Backpropagation and Node Selection

The backpropagation process updates the reward values and Q-values of the leaf nodes, and then propagates these changes up to the parent and ancestor nodes. This refines the parent node's Q-value by considering both its current value and the best possible outcome from its child nodes, providing a more accurate assessment of the node's quality.

The algorithm then selects the leaf nodes and those that are not fully expanded for the next round of choices. A node is considered "fully expanded" when it has reached a predefined limit on the number of children and at least one child node has a Q-value exceeding the parent node's Q-value. This strategy aims to identify the nodes that are likely to yield higher-value answers in subsequent searches, improving the overall search efficiency and outcome quality.

### Termination Function

The MCTSr algorithm uses a termination function to determine when to stop the search process. This function can be based on various criteria, such as early stopping when search results show diminishing improvements or repetitive outcomes, search constraints that limit the number of

## Evaluating the Performance of MCTSr

The researchers have evaluated the performance of the MCTSr algorithm on several benchmark datasets, including the

### GSM Benchmarks

The results on the GSM8K and GSM-Hard test sets show that increasing the number of MCTSr rollouts directly correlates with higher success rates, particularly for the less complex GSM8K set. However, the more intricate GSM-Hard set exhibited a performance ceiling even at higher rollout counts, indicating the limitations of current strategies in tackling complex problems.

### MATH Benchmark

On the MATH dataset, which is divided into five levels of difficulty, the 8-rollouts MCTSr configuration achieved the highest overall success rate of 58.24%, solving 2912 out of 5000 problems. This represents a substantial improvement from the initial 24.36% success rate of the

### Olympiad-level Benchmarks

The MCTSr algorithm was also evaluated on three datasets from mathematical Olympiad competitions: AIME, GAIC Math Odyssey, and OlympiadBench. The results showed a clear correlation between increased rollouts and higher success rates, highlighting the algorithm's potential to improve performance through iterative refinement. The GAIC Math Odyssey results, in particular, demonstrated the MCTSr's strong generalization abilities in new environments, suggesting its applicability in educational technologies for competitive academic settings like Olympiads.

## Conclusion

The MCTSr algorithm represents a significant advancement in the integration of large language models and mathematical reasoning. By combining the systematic exploration capabilities of Monte Carlo Tree Search with the self-refine and self-evaluation abilities of LLMs, the MCTSr algorithm has shown remarkable improvements in problem-solving success rates across a wide range of mathematical benchmarks, including challenging Olympiad-level problems.

This research paves the way for the further integration of AI technologies to enhance decision-making and reasoning accuracy, with potential applications in educational technologies, automated reasoning, and beyond. While the current algorithm has demonstrated its effectiveness, the researchers note that the broader applicability of the MCTSr approach in areas like black-box optimization and self-driven alignment remains to be explored in future studies.