Filling the gaps in active inference

Noumenal Labs

Mar 59 min read

by Noumenal Labs

The tl;dr

In this blog post, we discuss active inference. Specifically, we discuss key gaps in state-of-the-art applications of active inference in artificial intelligence — and how Noumenal Labs is working to fill them
Active inference is a form of Bayesian machine learning that replaces the objective function used in traditional machine learning — the cost or reward function — with an information theoretic surprise minimization objective, which is used for both inference and learning.
While theoretically well grounded, active inference has yet to yield state of the art performance in practice.
This gap in performance is largely due to the relative simplicity of the generative models, underlying state space, and relationships typically used in the active inference literature. Generically, these models lack the requisite structure to represent the world as we understand it, i.e., as composed of distinct objects and object types, as well as object-type-specific interactions, unless hand-specified by the user.
Noumenal Labs is developing a new approach to macroscopic physics discovery which enables us to build machine intelligences that discover the different objects/concepts and object/concept types that generate time series data, as well as the type-specific rules that govern their behavior — doing so directly from data, in an unsupervised manner.
Noumenal Labs’s approach is to build models that are explicitly structured — in a manner that is consistent both with the means by which we intuitively understand and conceptualize the world, and the means by which our best scientific models describe it.

An overview of active inference

Here, we discuss our view of active inference: its enormous promise, the key limitations of the current formulation, and how our work at Noumenal Labs is filling the gaps. Active inference is a relatively new approach in the field of artificial intelligence. It was first introduced in the late 2000s in theoretical neurobiology and computational neuroscience, where it has become a leading approach within the Bayesian Brain community. Active inference has generated a lot of excitement lately, especially where it concerns applications to industry, in fields like robotics and artificial intelligence.

Active inference is sometimes described by researchers in the field as a complete alternative to traditional machine learning. While active inference clearly has several advantages over reinforcement learning, which we discuss below, this is an exaggeration. Active inference is more accurately described as one among several closely related approaches within the broader class of Bayesian approaches to agent design.

Comparing active inference and reinforcement learning

There are some key differences between active inference and reinforcement learning. It is insightful to compare the different setups formally at a high level.

State of the art active inference and model based reinforcement learning models are usually implemented as a partially observable Markov decision process (POMDPs).

Here, one defines:

A state space S, which models the underlying hidden state of the environment
A likelihood model P(o ❘ s), which represents the probability of observing a given outcome o, conditioned on the assumption that the system is in state s
A set of actions A, which the agent can take in the environment, and which affect transitions between states in S
A model of transition probabilities P(s’ ❘ s, a), which represents the probability at time t of transitioning from one state s to a subsequent state s’ via action a
A reward or objective function, R(o, s’, s, a), that motivates goal directed behavior by placing a relative value or cost to observations, actions, and states

With this basic model setup in place, the aim in either case is for an agent to learn the optimal policy, p(a | s), that maximizes expected reward. In reinforcement learning, the objective used is always a reward function, which associates a scalar value to a function of observations, actions, and transitions from s to s’. This reward function is hand crafted by the experimenter, for the purposes of the specific use case and various reinforcement learning techniques generically constitute different ways of computing, approximating, or directly learning this policy function.

Active inference also makes use of the POMDP formulation. The main difference between active inference and reinforcement learning is how the objective function is defined. Active inference differs, in that it replaces arbitrary reward functions with an information theoretic surprise minimization objective, called the variational free energy. This quantity is a function of data and is computed as the difference between the data expected under a model of how data is generated, and the data observed. Of note, this objective function is not unique to active inference — in many sectors of machine learning, we optimize the negative of this quantity (the evidence lower bound or ELBO), and precisely the same objective is used in maximum entropy inverse reinforcement learning, where it is known as the marginal cross entropy function.

From a practical modeling and experimental perspective, the nice thing about using this objective function is that it combines a risk-sensitive, goal seeking drive with a drive to seek new information. Indeed, (negative) variational free energy can be decomposed into an expected utility plus information gain. Active inference agents that extremize variational free energy end up selecting policies that optimally balance the drive to resolve uncertainty about the current state of affairs, and to achieve their goals.

Perhaps more significantly, the move from a reward based objective to an information theoretic objective has profound implications for how we study and design physical systems using modeling techniques from machine learning techniques. As discussed above, reward functions in reinforcement learning are arbitrary and hand-specified by the user — meaning that, to model some physical object via reinforcement learning, the user must manually specify the relative value of all possible outcomes. This begs the question: how would we know what outcomes a network of neurons or E. coli find rewarding — and how would we place numerical values on those outcomes. The short answer is that you can’t. Put plainly, there is no normative solution to the problem of reward function selection.

In physics, there is no reward function. It would be absurd to attempt to design a function to describe what non-agentic objects like rocks, protons, or fluid elements find “rewarding”. This is what makes physics a normative theory of how the world works. Of course, we anthropomorphize objects all the time, speaking about them in quasi-agentic terms, for instance describing inert objects using quasi-intentional or teleological language like “the ball rolls down the hill because it seeks the lowest energy state”. Nonetheless, we would not be inclined to actually impute a goal to the ball: it does what it does because it is a ball.

The crucial insight of the free energy principle is that the difference between an agent and a physical system is merely one of complexity — and since physical systems require no explicit reward function, neither should a model of an agent. Physical systems are defined by their energy functions. Interestingly, the variational free energy objective has the same functional form as the free energy in statistical physics — and can be derived from first principles for any physical system. The implication is profound: just as we write down energy functions for physical systems, we can write down free energy functions for agents. This is perhaps the deepest insight that we can draw from active inference and the associated literature on Bayesian mechanics. Just as a pendulum is defined by a quadratic potential function, an active inference agent is determined by a free energy functional, which defines the behavior of a class of objects — what has been called the “ontological potential function” of that object class. This makes the active inference approach normative in the same way that physics is normative.

While active inference models do not feature an explicit reward function, they do feature some notion of a goal that can be interpreted teleologically. In physics, energy functions specify the steady state distribution of a system, and free energy measures the energy dissipated along the way to steady state. In the same way, active inference agents are defined, in part, by a steady state distribution that characterizes its “homeostatic objective”. Just as a physical system “aims” to relax to its steady state distribution, and active inference aims to achieve a kind of generalized steady state equilibrium with its environment. The nature of this steady state effectively defines an agent by its relationship with the environment and provides an agent specific ontological potential function. So while active inference does not technically have a reward function, it does include a means of motivating the type of behavior that defines an agent of a particular type.

To summarize, in model based reinforcement learning and in active inference, agents are implemented as POMDPs and learn to select an optimal policy — in one case, by maximizing a reward function, and in the other, by minimizing surprise conditioned on a generative model. Active inference broadens the scope of the objective function used in reinforcement learning, augmenting the expected utility or reward term with a term that induces a drive to seek out new information. And more importantly, it fundamentally changes the manner in which we approach modeling systems using techniques from machine learning.

A POMDP is a framework, not a world model

All this sounds awesome. So, what is the gap in active inference? It is this: Despite claims to the contrary — there effectively are no world models with any degree of sophistication in the vast majority of state of the art active inference literature. Underlying state dynamics are almost uniformly assumed to be vanilla hidden Markov models or their temporally hierarchical cousins.

This class of models is technically generative, but its impoverished representation of the world prevents it accurately describing sophisticated situations. In our view, the POMDPs used in state of the art active inference would be massively improved by incorporating an explicit object-based representation of the world with objects represented via high dimensional dynamic embeddings of the sort typically used in modern generative AI approaches. The discrete POMDPs used in active inference are also structurally incapable of learning physic laws directly from data. This is because it is mathematically impossible to define and computationally intractable to approximate the relevant symmetries in a discrete state space like that of a vanilla POMDP.

To be fair, some active inference models have been used to model the world in a somewhat object centered way, but with objects and their relations explicitly defined by the user rather than discovered from data. This results in POMDPs that only represent features of the world by virtue of a semantic interpretation that is imposed onto the model by the user. As a result, what makes a collection of states or parameters of a POMDP count as a distinct object is nothing but a fiat decision by the modeler. This is not at all aligned with the highly effective, data-driven approaches used in modern machine learning and runs the risk of imposing unjustifiable biases and eliminating the possibility of discovering novel insights and relationships directly from data.

Closing the gap

Until recently, this arbitrariness and lack of genuine world modelling has been a core problem for active inference, albeit one that is not often recognized within the field. The technology that we are developing at Noumenal Labs allows us to deliver on the promissory notes of active inference, by allowing us to learn grounded world models replete with objects, object types, and their properties all represented in a manner consistent with the structure of the world in which we live. This is accomplished by distilling from mechanics and systems engineering the fundamental structure that guides empirical scientific inquiry.

The macroscopic physics discovery technology that we have developed at Noumenal Labs fills this core gap in active inference — allowing us to learn sophisticated world models grounded in real world data. The macroscopic physical rules that are of interest to us generalize the notion of force in information theoretic terms, allowing us to formulate object-type-specific rules of interaction between objects. Ultimately, this approach is guided by the way in which scientists identify and classify objects, and learn causal, dynamic, relational, and composable object-centered representations directly from data.

The main mathematical tool used to accomplish this is an information theoretic notion of a boundary, known as a Markov blanket. In statistics, the Markov blanket of a set of “internal” variables is a set of “blanket” variables that establish conditional independence between the internal set and all other variables in the system. In the free energy principle literature, the Markov blanket is used to formalize mathematically the notion of object. The reason is that the statistics of the Markov blanket variables provide a complete description of inputs and outputs to an object, and thereby fully characterize the interactions between that object and the other objects in its environment. This formalizes the behavioral profile of a subsystem in terms of the effects that it has on other subsystems, and vice versa.

For our purposes, Markov blankets are crucial because they specify the kind of structure that should be built into the objects that populate a sophisticated world model capable of representing macroscopic objects, their properties, and their interactions. We have used the Markov blanket formalism to develop an approach to macroscopic physics discovery — enabling the identification and typification of the “objects” directly from data, which can then be used to populate a grounded world model. This approach is generative, but with an explicit representation of underlying objects, much like how a video game engine represents the world as a collection of assets. In that domain, the rules that govern object interactions are inspired by Newton’s laws and are represented using forces. The utility of our approach is that we can generalize the notion of force to learn simple but effective rules that govern the behavior of macroscopic objects directly from data. Moreover, because object and object role labels are dynamic, this approach allows one to naturally capture complex behaviors associated with objects that evolve, reproduce, pop into and out of existence, coalesce into bigger objects, and exchange matter with their environment (e.g., eat). This fills the key gap in active inference, enabling us to learn sophisticated grounded world models directly from data.

In summary, Noumenal Labs’s approach to macroscopic physics discovery enables the discovery of grounded world models that agree with our scientific understanding. This technology closes the most crucial gap in state of the art active inference, namely, how to identify and classify objects in the first place — and how to learn causal, dynamic, relational object-centered representations.