Large Language Models 3: Key Behaviors

May 20

EXPLAINING KEY LLM BEHAVIOURS

This is the third post in a series. You’re probably best starting here to get the background.

Emergent behaviors and planning capabilities

When our rovers explore the terrain, they don’t just take single steps without considering what’s ahead. Instead, they’re constantly scanning the horizon and communicating about promising features in the distance.

For example, when asked to write a rhyming poem, it’s like the rovers spot a distinctive terrain feature in the distance (the potential rhyming word) and then chart a course that will eventually lead there. They might identify multiple potential valleys or grooves (“rabbit,” “habit,” “grab it”) and their programming selects the most promising one based on which path would create the most natural journey.

This explains how modern LLMs appear to plan ahead - they’re not just selecting one step at a time but identifying important destination features early in the exploration and working backward to find natural paths to them. The rovers aren’t consciously planning, but their exploration algorithms allow them to coordinate complex journeys that appear remarkably purposeful.

As the model size increases (our Alps model becomes larger and more detailed), more sophisticated path-planning becomes possible because the terrain preserves more subtle relationships between concepts. What looks like “metacognition” emerges naturally from the increasingly detailed representation of language patterns in the terrain itself (Clarke and Chalmers extended mind!!!)

Confabulation

Confabulations occur when rovers explore oversimplified terrain where compression has lost important details (such as asking specific lookup questions like URLs or non-famous people). Due to compression, distinct features might get merged or simplified - so what should be the Matterhorn might look suspiciously like the Eiger from certain angles. Indeed, the terrain may even be distorted relative to reality.

This explains why LLMs sometimes blend facts or concepts - in their compressed representation, these distinct entities share features that make them difficult to distinguish without the full detail of the original.

Agency (Or Lack Thereof)

It’s important to clarify that in this metaphor, the rovers don’t possess understanding or agency. Like a real Mars rover, they’re executing programmed instructions, not making conscious decisions or understanding the terrain they navigate. They’re automated instruments following mathematical operations that determine their path.

Wolfram’s Computational Irreducibility

Stephen Wolfram mentions that LLMs have fundamental limits due to computational irreducibility: they can’t perform complex multi-step reasoning without external tools. This manifests in the metaphor in several ways.

Unlike human reasoning that can switch between intuitive pattern-matching and step-by-step calculation, the rovers can only navigate the terrain they find - they have no ability to “stop and calculate.” When faced with problems requiring sequential logic (like matching parentheses) or mathematical operations, the rovers are limited to following next move probability based on their scans.

For example, when matching parentheses, rovers might successfully navigate shorter sequences where the statistical patterns are clear in the terrain, but fail with longer sequences where the proper matching requires actual counting - something impossible in a single forward pass through the landscape. The terrain might have common patterns like “((...))” well-represented as valleys, but cannot encode the infinite combinatorial possibilities of arbitrary nesting.

This is why the rover team can write an essay that sounds intelligent but will fail to reliably solve math problems or perform logical operations that require maintaining state across many steps. Each rover step is just sampling the local probability landscape; they cannot accumulate computational state or iterate through algorithms.

Reinforcement Learning

Reinforcement learning is the equivalent of performing extra LIDAR scans of a specific section of the alps and building it in greater detail on the NASA model. However, this can distort the model overall.

Tweaking the model for a specific use is like laying some roads or footpaths down to get the rovers moving across the terrain more efficiently (but may make it more likely skip other answers and ideas).

Bias

Training data biases manifest as regions missing from the landscape, or misshapen areas. The LIDAR scan was distorted by a tree or some other reflection that messed up the scan in that area.

Parallel Processing

The parallel exploration and communication explains how transformers process information simultaneously.

LLM Model Construction, Training and Gradient Descent

The Alps landscape isn’t static during training - it’s constantly shaped by gradient descent. It’s not like a LIDAR scan that is 3d printed.

To create the model NASA had to construct it as a flat plain of clay, and then gradually sculpt it by repeatedly running water flow (based on the LIDAR data provided) across its surface based on the scans.

Areas where many data streams converge would be eroded into valleys (low loss/high probability paths), while rarely visited areas would remain as peaks.

After millions of iterations of this erosion process, the landscape would encode the statistical patterns of the water flow - much as gradient descent shapes the weight landscape to minimize prediction error.

David Thomson

Large Language Models 3: Key Behaviors

EXPLAINING KEY LLM BEHAVIOURS

Large Language Models 4: The Weeds

Large Language Models 2: Mountains and Mars Rovers