Thinking Machines Lab wants to make AI models predictable: “the same query — the same answer”
Startup Thinking Machines Lab, led by former OpenAI CTO Mira Murati, has announced the start of work on eliminating one of the most frustrating problems of language models: unpredictability of responses. Even with identical queries, users today often get different results, and the team decided to investigate the reasons behind this phenomenon.
In their blog, the company’s engineers published the study Defeating Nondeterminism in LLM Inference, which explains in detail why nondeterminism arises and how it can be minimized.

It turned out the issue is not just about model settings like temperature. Even with temperature = 0, when the system should select the most probable token, results can vary. The cause lies in computation itself: GPUs work in parallel, their cores execute tasks asynchronously, and floating-point operations introduce slight variations in the order of calculations. These microscopic hardware-level differences ultimately add up to noticeable variation in generated texts.
Thinking Machines Lab proposed several engineering solutions. One involves ensuring so-called batch invariance, where results do not depend on batch size. Another relates to simplifying computational kernels and controlling the execution order of GPU operations. According to the researchers, combining these approaches should allow model outputs to become stable and reproducible.
The practical importance of this step is hard to overestimate. For scientific research, reproducibility is key to trust in findings. For business and government systems, predictability is crucial: in legal opinions, medical diagnostics, or financial operations, models cannot afford to respond differently to the same query. For the general public too, trust in AI is often undermined precisely because of the sense of randomness and arbitrariness in responses.
Thinking Machines Lab is betting that these new methods will turn language models from inspiring but inconsistent conversationalists into reliable tools, capable of providing identical answers where consistency truly matters. This represents a move from the chaotic magic of machine creativity toward engineering rigor — and if successful, it could significantly boost trust in AI systems.

