What is Q* (Q-Star)?

November 27, 2023

Q* (pronounced Q-Star) represents a potential breakthrough in the realm of Artificial General Intelligence (AGI), a concept that refers to autonomous systems capable of outperforming humans in most economically valuable tasks. The rumored emergence of Q* within OpenAI has sparked significant excitement in the AI community.

[Update July 2024: OpenAI working on new reasoning technology under code name ‘Strawberry’, which appears to be a continuation of Q-Star]

The Significance of Q*

It is speculated that OpenAI's Q* could revolutionize generative AI. Unlike current AI systems, which excel in language and writing tasks, Q* apparently shows promise in mathematical problem-solving - a field where correct answers are definitive and not subject to interpretation. The development suggests that AI is inching closer to human-like reasoning abilities, potentially unlocking new possibilities in scientic research and beyond.

The Tree-of-Thoughts Approach

A key component of Q*, it is hypothesized, is the so-called Tree-of-Thoughts (ToT) reasoning. This method allows the AI to create various paths of reasoning, leading to more nuanced and sophisticated problem-solving. It's a step forward in AI's ability to not only generate answers but to develop the reasoning process akin to human thought.

Process Reward Models (PRM)

The authors of this post on the Q* hypothesis also believe that Process Reward Models are integral to Q*'s development. Traditional AI models receive a single score from their outputs, but PRMs allow for a more granular approach by scoring each step of the reasoning process. This enables a finer-tuned generation of solutions, particularly effective in structured problem-solving tasks like mathematics and physics.

The Future of Q*

While a lot of the details of Q* are still shrouded in speculation, it appears that it may involve the use of PRMs and ToT in a unique combination, optimizing with Reinforcement Learning (RL). Q* is believed to leverage vast computing resources, enabling the AI to assign scores to each reasoning step, a task traditionally done by humans. This approach has the potential to bridge the gap between current AI capabilities and the much-anticipated realm of AGI.