Day 6: Abstractions & Latents

Learning outcomes

Why do we care about abstractions?

Abstraction = discarding unnecessary information, keeping only what’s useful for prediction/goals. Analogous to coarse-graining in statistical mechanics
Human values are latent variables in world models, not physical states → transferring goals to AI requires translating latents (pointers problem)
Translation tractable only if agents converge on same abstractions → motivates uniqueness/agreement theorems
Abstractions should be robust to ontology shifts so that AI keeps caring about right things under world model revision
Good theory of abstraction should explain modular, interpretable decomposition of world models

Natural latents

Understand mediation and redundancy intuitively; understand minimal mediators and maximal redunds
Understand why the mediator determines the redund, and how these jointly pin down a unique natural latent
Understand the guaranteed translatability theorem and its alignment significance
Be able to reason about limitations of the framework

Condensation

Understand how condensation addresses organisation of knowledge into interpretable structure, not just compression
Understand perfect condensation and its agreement result; understand the correspondence with natural latents

Softwareness in the natural world & Factor Space Models

Understand causal and computational closure; be able to give intuitive examples (softwareness in the natural world)
Understand why deterministic relationships break standard Bayesian networks and how factored space models resolve this

Partial information decomposition?

Algorithmic statistics?

Renormalization group?

Prerequisites

Background in statistical mechanics is useful for understanding the motivations for abstractions in general
(Ontology identification is potentially out of scope but inform the motivation for abstractions)
Familiarity with basic information theory properties and properties about KL divergence are assumed
Familiarity with bayesian networks is helpful for understanding the diagrammatic proofs in natural latents
The concept of universal property in category theory is helpful for understanding uniqueness results in natural latents
Some background in measure theory is helpful for understanding condensation rigorously

Content

Fast-track

Minimal:

Standard

Abstraction is about throwing out information while only keeping the parts that are useful for achieving one’s goals or predicting the future— for instance, when predicting a star’s trajectory, only the total mass is relevant, not the exact configuration of particles inside it. This matters for alignment because humans’ world models throw out a lot of information about the physical world, and so the things we care about correspond to abstractions/latent variables in our world models rather than to precise low-level physical states. As argued in the pointers problem, transferring our goals to an AI therefore requires translating those latents into the AI’s world model. This is more tractable if a wide variety of agents converge on the same abstractions in some sense, motivating uniqueness/agreement theorems. We also want abstractions to be robust to ontology shifts, so an AI continues caring about the right things even as it radically revises its world model.

The natural latents framework formalises this via two conditions: a latent must mediate between observables (which become independent given the latent) and be redundant (recoverable from any individual observable). A key result is that the mediator determines the redundant variable, and these conditions jointly pin down a unique natural latent. The payoff is a guaranteed translatability theorem: if two agents both use natural latents, each agent’s latents are guaranteed to be a function of the other’s. The Bayes net algebra developed alongside this framework lets one reason about such latent structures diagrammatically, with clean approximate versions of each rule.

Condensation addresses a complementary question: whereas information theory asks how to compress data efficiently, condensation asks how to organise it so that it is easy to use — forming discrete, interpretable conceptual structure rather than a compressed blob. It proves a similar agreement result: different approximately-efficient condensations will posit approximately isomorphic latent variables.

Softwareness in the natural world approaches the abstraction from a computational angle. It formalises causal closure and computational closure — conditions under which a macroscopic process is self-contained in its informational and interventional properties, much as software is self-contained relative to hardware.

Finally, factored space models extend Bayesian networks to handle deterministic relationships, which arise naturally when macro-level variables are functions of micro-level ones. Standard causal graphs break down because they cannot faithfully represent certain deterministic relationships (such as XOR); factored space models resolve this by expressing the sample space as a Cartesian product, enabling a faithfulness condition that Bayesian networks cannot satisfy in this setting.

Teaching guide

1 hour of introductory lecture on motivation and existing work on abstractions
2.5 hour readings and discussions on natural latents, condensation & maybe factor space models/algorithmic statistics/renormalization group
3 hours of exercises