top of page

Why AI should reason like a scientist

Writer's picture: Noumenal LabsNoumenal Labs

Updated: 6 days ago

by Noumenal Labs

The tl;dr

  • Generative AI and large language models (LLMs) feature feedforward architectures based on the diffusion model and on the transformer architecture. They have yielded exciting breakthroughs in the AI space

  • However, generative AI and LLMs are optimized for rote prediction and pattern completion, which limits their usefulness. These systems are limited to predicting the next data point. They cannot generate an explanation of the data 

  • The autocomplete-like functionality of generative AI and LLMs surely has pragmatic utility, but it does not enhance our understanding of the data. Thus, human decision making cannot be extended by such systems. 

  • Noumenal Labs is pioneering model architectures that generate the kind of simplified, composable explanations of data that are provided by the scientific method 

  • The AI systems that we are developing reason about the world like a scientist does

  • Scientists not only predict data, but also compress data, generating simplified explanations, testing hypotheses, and resolving uncertainty about the causes of the data 

  • This means going beyond mere data prediction, towards composable object-centered explanations of the world

  • This is the true path to the design of machine intelligences that increase our understanding of the world, empower our decision making, and enable us to develop new technologies  

Introduction

In this blog post, we discuss the difference between prediction and explanation in the context of machine intelligence — and why their difference matters if our aim is to use machine intelligence to enhance human understanding and empower human decision making. We focus on a key limitation of state-of-the-art systems: They are designed for rote prediction and pattern completion — neither of which is sufficient for genuine understanding and explanation. 


We argue that an ideal machine intelligence should implement not only what we have learned about the brain, but also the methods that we use to study it. Scientific investigation — perhaps the most sophisticated form of human understanding — is about data prediction and also, crucially, data compression. Prediction is ultimately about understanding the probability of future outcomes, given what has happened in the past and an hypothesis-testing intervention. Explanatory data compression, on the other hand, is about generating a simplified representation of the data that is amenable to compositional model generation. This kind of simplified, object centered representation is what enables the kind of systems engineering that lead to our greatest technological breakthroughs and is the crucial missing ingredient in state-of-the-art AI. 


The gap between mere prediction and genuine understanding


It is a truism that state of the art approaches in machine learning are optimized for rote prediction and pattern completion. The capacities unlocked by this technology — to predict the next word in a sentence or complete an image or video — are certainly quite impressive.  


But neither prediction nor pattern completion are sufficient for, or equivalent to, genuine understanding. Indeed, there is a marked difference between merely being able to predict the next data point (for instance, the next word in a sentence, or the next frame in a video) and being able to generate a simplified, human understandable explanation of why the outcome turned out the way that it did. 


Generative AI systems are unstructured models of data — that is they are a description of, p(x), where x is the data. Generative AI models are essentially correlational models, which extract and encode correlations between data points, usually in extremely large, multimodal datasets. This class of models excels at prediction precisely because they directly model data. This is clearly evident in the diffusion models that are standardly used for image generation, which represent the probability distribution over images. This distribution is constructed by starting with a noisy image and transforming it via a feedforward neural network into an image that is as much as possible like those in the training set. To the extent that this model has an internal representation of the image, that representation is just a noisy version of the original. This representation is convenient because it is marginally Gaussian, so it is easy to use for computation, but it certainly does not compress or explain the original image.


In the linguistic domain, transformer based architects — what are known as large language models (LLMs) — suffer from the very same deficiency. Their design precludes the generation of explanations of data. They take in a very long list of words, and predict the next one, using the transformer architecture to generate output words given a prompt (input words). Again, here, there is no explanation, just brute force prediction. The LLM can only complete the prompt, it cannot simplify or explain what is in the prompt. Indeed, the context vector used for prediction has just as many words as the original prompt. In other words, there is no explicit summary or simplification of the inputs of the sort that could count as an explanation, just a refinement or processing of the representation of the input string. While this autocomplete-like functionality has pragmatic utility, it does not enhance our understanding of the data. Thus, human decision making cannot be extended by such systems.  


The situation is actually a bit worse. A LLM completes the prompt, responding with what is statistically the most likely way to complete that prompt. Many state of the art LLMs generate what appear to be explanations for the output that they have produced. But crucially, LLMs complete the prompt in the way that an average speaker of the language would. These outputs are not explanations, but mere rationalizations of the output. They supply what a generic speaker would most likely say if you forced them to ‘provide an argument’ for why that output was generated. This is neither the actual reason that the output was generated, nor is it the manner in which an expert would explain the output. This is not understanding. 


From prediction to explanation and understanding  


Understanding generally — and scientific understanding in particular — are not merely about prediction and pattern competition. As discussed, scientific investigation is certainly about resolving the uncertainty about the causal relationship between past and future events, but more importantly, it is also about producing simplified explanations of why observable outcomes (data) turned out the way that they did. These explanations are simplified and composable. A complex phenomenon is reduced to the simplest possible explanation, but in a way that allows that explanation to be combined with other explanations of a given class. For example, we can reduce all mechanical systems to a compact and simplified description, in terms of how they respond to forces. In this setting, forces provide the common language that allows us to build complex things from our understanding of how simple things interact; for instance, building an airplane from well understood things like engines and air foils.  


At Noumenal Labs, we are focused on explainable models with inspiration drawn from two main sources: the evolution and development of intelligences like ours and  the means that we deploy to study it — scientific investigation. We want AI models that behave like little scientists. That is, we don’t want an AI to merely generate plausible sounding explanations, complete a sentence, or finish a video or image. Rather, we want machine intelligence that is capable of generating causal, object-centered, simplified explanations of the data that are composable and can guide the invention and engineering of novel solutions. We want AI models that provide explanations which give insight into the causal relationships that hold in a given system, allowing us not only to predict future outcomes but to intervene in the domain in a causally efficacious way. While they are impressive in many respects, this is something that state of the art generative AI is not designed to achieve. 


We believe that the ideal artificial agent should behave like a curious scientist, using the data that it generates through carefully designed experiments to test its hypotheses about the world, its structure and its components.

 
 
  • Twitter
  • LinkedIn

Copyright 2025 Noumenal Labs, Inc.

bottom of page