Benchmarking Compositional generalisation for Learning Inter-atomic Potentials

Amir Masoud Nourollah, Irtaza Khalid, Stefano Leoni, Steven Schockaert

, January 2026

Abstract

Inter-atomic potentials play an important role for modelling molecular dynamics. Unfortunately, traditional methods for computing such potentials are computationally heavy. In recent years, the idea of using neural networks to approximate these computations has gained in popularity, and a variety of Graph Neural Networks and Transformer based methods have been proposed for this purpose. Recent approaches provide highly accurate estimates, but they are typically trained and tested on the same molecules. It thus remains unclear whether these models mostly learn to interpolate the training labels, or whether their physically-informed designs actually allow them to capture the underlying principles. To address this gap, we propose a benchmark consisting of four tasks that each require some form of compositional generalisation. Training and testing involves separate molecules, but the training data is chosen such that generalisation to the test examples should be feasible for models that learn the physical principles. Our empirical analysis shows that the considered tasks are highly challenging for state-of-the-art models, with errors for out-of-distribution examples often being orders of magnitude higher than for in-distribution examples.

Read Paper