Learning to Simulate

I come across this talk the other day and find the paper’s result astonishingly impressive. This DeepMind project can be found here.

High Level Overview - Simulation as Message-Passing on a Graph

Particles within a particular small distance from each other are given edges so particles only interact with their immediate neighbors.
Features are assigned to each of the nodes and each of the edges
- Nodes: Its mass, its other material properties, velocities of previous five time steps ……
- Edges: The distance between the two particles that are interacting, the spring constant, ……

Takes these new updated states and turns them into acceleration vectors
Final output of the network: to predict the acceleration of the particles in the current time step

Encoder - Processor - Decoder

Encode, Process, Decode

Responsible for constructing the graph
Initializing the node and edge features
One detail: they don’t just use the raw features, they first pass them through dedicated neural networks, that projects them into some new space that facilitates learning in the downstream tasks
At the end you have the initial embeddings
No interactions between particles has happened at this point
ENCODER definition:

The ENCODER: $\mathcal{X} \rightarrow \mathcal{G}$ embeds the particle based state representation, $X$, as a latent graph, $G^0 = ENCODER(X)$, where $G = (V, E, \mathbf{u}),\,\mathbf{v}i \in V$ and $\mathbf{e}{i, j} \in E$.
- The node embeddings, $\mathbf{v}_i = \varepsilon^v (\mathbf{x}_i)$, are learned functions of the particles’ states
- The edge embeddings, $\mathbf{e}{i,j} = \varepsilon^e (\mathbf{r}{i,j} )$, are learned functions of the pairwise properties of the corresponding particles
- The graph-level embedding, $\mathbf{u}$, can represent global properties such as gravity and magnetic fields

The number of graph network layers you have determines how far each message can propagate: e.g. three layers means you can go three hops away from a node
At the end of the process you have updated embeddings for each of the nodes that now takes into account its $m$-hop neighborhood where $m$ is the number of graph network layers used in the processor
PROCESSOR definition:

The PROCESSOR: $\mathcal{G} \rightarrow \mathcal{G}$ computes interactions among nodes via $M$ steps of learned message-passing, to generate a sequence of updated latent graphs, $\mathbf{G} = (G^1, …, G^M)$, where $G^{m+1} = GN^{m+1}(G^m)$. The final graph $G^M = PROCESSOR(G^0)$.

Translate these updated embeddings into acceleration vectors
DECODER definition:

The DECODER: $\mathcal{G} \rightarrow \mathcal{Y}$ extracts dynamics information from the nodes of the final latent graph, $\mathbf{y}_i = \delta^v(\mathbf{v}_i^M)$

First challenge: particles are going to interact with different neighborhood sizes
Second challenge: neural networks/RNNs are not permutation-invariant, but real physical world is

First: they encode what interact with what
Second: they combine neighborhood information in a way that’s permutation-invariant
Third: they are very data efficient because they have lots of examples that all share the same network parameters so all the particles are using the same model parameters