Unlocking Higher Accuracy using Iterative Prompting with GPT-4o mini

GPT-4o mini represents a significant leap forward in deploying GenAI within our apps. Not only does it offer substantial cost savings (60% cheaper than GPT-3.5 Turbo!), but it also provides a solid base model at a great price. This allows us to use advanced prompting techniques to enhance performance and my tests over the weekend show that GPT-4o mini can outperform GPT-4 Turbo.

This, I believe, is where the real power of GPT-4o mini lies—leveraging this model to explore multi-step interactions that refine and enhance the outputs.

Take, for example, the Monte Carlo Tree Self-Reflect (MCTSR) technique developed by Di Zhang, Xiaoshui Huang, Dongzhan Zhou, and Yuqiang Li at the Shanghai Artificial Intelligence Lab. They were able to take a smaller model, llama-3-8b, and get it to perform at levels comparable to GPT-4 and do so by building on initial responses and refining them through iterative prompting by applying principles from Monte Carlo Tree Search (same high-level idea used in AlphaGo).

Over the past weekend, my experimentation with GPT-4o-mini and the MCTSR technique revealed some interesting findings: GPT-4o-mini, with iterative prompting, surpasses GPT-4 Turbo.

Benchmark	GPT-4o mini + MCTSR	GPT-4 Turbo
Gsm8k	96.7%	93% (few shot, k = 5, CoT)
GAIC (MathOdyssey)	56.3%	49.1%
MATH	80.3%	73.4%

Here are the results I got with GPT-4o mini on some of the other benchmarks used in the MCTSR paper:

Benchmark	GPT 4o mini + MCTSR
GsmHard	63.4%
AIME	37.8%
OlympiadBench	30.4%

Although I spent about $140 for my tests, it seems more economical compared to GPT-4 Turbo.

Here are the counts of the questions that were attempted:

Benchmark	Question count
gsm8k	1319
gsmhard	1319
GAIC	389
MATH	5000
AIME	933
OlympiadBench	1275

Code:
All the code and JSONs from my benchmarks runs with gpt-4o-mini are saved at https://github.com/SidU/MathBlackBox.

Summary:
The ability of GPT-4o mini to tackle complex problems iteratively highlights its potential. Start simple, but consider more iterative prompting techniques for better reasoning in complex problems and domains. However, if using iterative prompting, ensure your UX informs users asynchronously when reasoning is complete, rather than making them wait on your webpage/view for results, as the model emits tokens to ‘think’ and ‘reflect’ which takes some time.

Learn more:
Monte Carlo Tree – Self Refine paper
YouTube video from Trelis Research with an explanation of the above paper
Great overview of Monte Caro Tree Search

Posted

July 21, 2024

General

Sid

Tags:

genai

Unlocking Higher Accuracy using Iterative Prompting with GPT-4o mini

Share this:

Comments

Leave a comment Cancel reply