CINNs? ChINNS? We need a better acronym.
Physics-informed learning holds a lot of potential for computational science. In particular, physics-informed neural networks (PINNs) and physics-informed neural operators (PINOs) are an alternative way to solve differential equations. They promise mesh-free and principled solutions to differential equations, but come at the cost of being non-trivial to train, and are often less accurate than classical solvers like the finite element and finite volume methods. Still, because of this promise, physics-informed learning has already found its way in real industrial use-cases. I highly recommend checking out AI in Engineering, Physics, Aerodynamics and specifically this post if you want to read more about these cases, their successes and limitations.
My thesis is that physics-informed learning can be applied much more broadly in science than just solving differential equations. In fact, the methodology can be used to simulate or solve any quantitative model, as long as there is sufficient structure. We just need a different name than ‘physics’-informed models. Depending on the application, they might be called chemistry-informed, biology-informed, medicine-informed, or perhaps most broadly first-principles-informed machine learning (FPIML - we need a better acronym).
The issue is that in physics, the setup is often relatively straightforward. You may have one differential equation and one unknown, and the task is to solve that equation using a neural network-based approximation. The goal may be speed, accuracy, or perhaps obtaining a good initial guess for a classical solver. In chemistry, however, the situation is broader. There are many quantitative tasks, and not all of them fit into the framework of solving differential equations. However, there are still rigorous and quantitative laws describing those problems, but those laws are not always expressed in the form of a PDEs or ODEs.
The Physics of Chemistry
I recently started thinking a lot about physics-informed learning in the context of chemistry. Chemistry is perhaps the field most closely related to physics. Much of chemistry can be described using the language of physics: the configurations of atoms in molecules and their chemical reactions theoretically follow the rules of quantum mechanics. The energy of a molecular configuration arises from interactions between nuclei, interactions between nuclei and electrons, and the electronic ground state energy, with excited states entering if needed. This is, broadly speaking, the kind of picture formalized by the Born–Oppenheimer approximation and used in quantum chemistry and ab initio molecular dynamics.
On a higher level, many decades of research have gone into building force fields that describe the motion of atoms and molecules according to Newtonian mechanics and statistical physics. These are purely quantitative descriptions using equations, energies, gradients and mathematical structure. The question begging to be asked is: can physics-informed learning improve our ability to simulate chemistry?
One example I’ve been spending a lot of brain cycles on is computing transition states, activation energies, and reaction pathways. Chemical reactions are, at their core, rearrangements of atoms and bonds. Those rearrangements must obey quantitative physical laws. Indeed, from a first principles point of view, molecular dynamics can be described by a potential energy function $V(x)$ that associates an energy value to any atomic configuration $x \in \mathbb{R}^{N \times 3}$. This energy is precisely the result from the quantum mechanical interactions mentioned above. The atoms experience a force proportional to the negative gradient of this potential
\[F(x) = -\nabla V(x)\]and move according to Newton’s second law. Due to thermal fluctuations, this motion is random, but can be well-described by (overdamped) Langevin dynamics
\[\frac{dx}{dt} = -\nabla V(x) + \sqrt{2 T} \ W(t)\]with $T$ the temperature in appropriate units and $W(t)$ a standard Brownian motion representing the randomness.
If the reactants and products are known, then the reaction mechanism is not a vague or qualitative object. Given atomic positions of the reactants and products, $x_R, x_P \in \mathbb{R}^{N \times 3}$, where the same $N$ atoms partake in both configurations, then the Minimal Energy Path (MEP) $x(s)$ between reactants $x(0) = x_R$ and products $x(1) = x_P$ is such that the local force field has no component perpendicular to the path. Writing
\[\tau(s) = \frac{x'(s)}{\lVert x'(s) \rVert}\]as the normalized tangent along the path, the minimum energy path satisfies
\[\left(I - \tau(s)\tau(s)^T\right)\, \nabla V(x(s)) = 0. \qquad \tag{1}\]In other words, the tangent to the path is parallel to the local energy gradient or the force. In addition, if the reactants and products are the exact local minima of the potential, the MEP connecting them will pass through a saddle point of the potential energy surface, also known as the transition state. If $x_R$ or $x_P$ is not exactly metastable, the MEP will pass close by the theoretical transition state.
Long story short: chemical reactions are also governed by the energy landscape, the force field, and by principled equations. That means it should be possible to approach reaction-path computation with principled computational tools. And that raises the obvious question: could physics-informed neural networks – or perhaps better, chemistry-informed networks – be used to solve for these reaction paths directly?
It should be noted that equation (1) does not pose a mathematically well-defined problem by themselves, even with Dirichlet boundary conditions $x(0) = x_R$ and $x(1) = x_P$. There are two main reasons. First, the problem is parametrization-invariant. If any $x(s)$ satisfies (1), any monotone reparameterization $x(\phi(s))$ will too. This problem can be avoided by also demanding that $\lVert\tau(s) \rVert$ remains constant along the path, a form of regularization. Second, the MEP is not necessarily unique. Two more paths may connect the same endpoints, or none may exist at all.
Even so, equation (1) forms the basis for many quantitative chemical methods. Nudged elastic band (NEB) methods, string methods, gentlest ascent dynamics, and related algorithms, all try to compute trajectories and stationary structures close to the MEP. The nudged elastic band method, for instance, minimizes a combination of the molecular force field and an artificial spring force that keeps instances of the molecule distributed uniformly along the path. String methods do something similar, but in a different formulation. Methods like the Gentlest Ascent Dynamics (GAD) rely more directly on gradients and Hessian information of the potential energy surface $V(x)$ and are therefore much more expensive.
CINNs? ChINNs?
This is where “chemistry-informed networks” could become interesting. Chemistry-informed neural networks and operators may present an easier-to-optimize, less delicate, and faster alternative to these methods. The NEB and string methods require a good initial guess of the reaction path, they work with discrete images rather than a continuous path representation, and they often require numerical care to converge to something physically plausible. GAD is an elegant idea but very expensive. A chemistry-informed operator could learn the transition path as a (continuous) function of the normalized arclength $s$
\[\text{NN}_{\theta}(s) \mapsto x(s) \qquad \tag{2}\]where $\theta$ are the trainable parameters. The network output is subject to Dirichlet boundary conditions because the transition path must start at the reactants and end at the product state. Of course, we are sweeping a lot of difficulty under the rug here because training (2) for many molecules and transition pathways would be hard – very hard.
The main question is what the loss function should be.
A data-driven model for transition states or reaction paths – for example a diffusion model or some direct supervised predictor – requires training data. That data usually comes from expensive classical computations such as nudged elastic band calculations or other path-sampling methods. In other words, the data-driven approach often depends on exactly the expensive computational machinery it is trying to bypass.
A chemistry-informed network could avoid that dependency. Instead of needing a large precomputed dataset, it would need only loss evaluations based on the governing chemistry. Let’s take a step back to physics-informed neural nets. There the loss function is simple: take the neural network output, push it through the PDE operator and compute the residual. The exact solution has zero residual and the network’s job is therefore to minimize this residual in the hopes of approximating the exact solution. The chemistry-informed analogue would be to compute the residual (squared) of equation (1) along the trajectory $x(s)$ and use that as a loss. Leaving non-uniqueness issues to the side, this approach seems plausible.
But maybe there is a better approach. I’ve recently been encouraged by variational methods for training physics-informed neural networks. My previous post explores this idea for one particular example, that of learning the deformations of a 2D plate, but the principle carries over to chemistry-informed networks as well. Here is the idea: in PDEs, one often replaces a strong residual formulation by a variational one: instead of enforcing the equation pointwise, one minimizes an integral functional whose minimizer solves the same problem in a weaker sense. This weak form typically requires only lower-order derivatives of the network output. Minimizing the variational form is equivalent to solving the strong form PDE but is much cheaper. What is the equivalent of an ‘energy functional’ for reaction paths?
One of the key results in reaction-path theory is that, for gradient systems, the MEP minimizes the action along the path
\[\mathcal{S}[x] = \int_{0}^1 \lVert\nabla V(x(s))\rVert\,\lVert x'(s) \rVert - \nabla V(x(s)) \cdot x'(s) ds, \qquad x(0)=x_R,\quad x(1)=x_P. \tag{3}\]The principle of least action is fundamental in physics and carries over to transition-path chemistry. Could the action be used as the loss function in chemistry-informed learning?
A few thoughts about the action formalism. Note that we didn’t loose a derivative in going from the ‘strong form’ (1) to the action principle. This is not a problem per se because the force field (or $\nabla V$) is, in a sense, the fundamental object, but it does break the symmetry with PINNs somewhat. Second, by switching from data-driven learning to chemistry-informed learning, we need direct access to the force field itself.
This is where things get interesting. Much of computational chemistry has been about designing good force fields for molecular dynamics. Classical families such as CHARMM, AMBER, OPLS, GROMOS, and Rosetta have had enormous success, but each come with their own domain of applicability. Reaction chemistry tests the limits of these force fields. Another option is to use direct quantum-mechanical evaluations of the energy $V(x)$, the force $-\nabla V(x)$ and the action (3). Conceptually clean but ridiculously expensive. Machine-learning potentials offer a good middle ground by learning force field parameters from QM simulation data, but to my (limited) knowledge, no all-purpose force fields have been constructed yet.
That said, the existence of a good force field is a broader problem in the computational chemistry literature and not particular to computing transition paths. If we had a sufficiently good force field or machine-learning potential, a chemistry-informed approach to reaction paths could be very realistic.
Closing Thoughts
What excites me is that this framework may offer a solution where data-driven methods struggle. Datasets for reaction-path prediction are relatively sparse, hard to generate, and are limited of the numerical methods used to create them. A chemistry-informed approach replaces the need for large labeled datasets with direct access to the physical structure of the problem.
I want to start exploring this idea further. My intuition is that reaction-path and transition-state prediction are probably the most natural first targets, because they are so closely tied to first-principles physics and chemistry. But I suspect this may not be the only area where chemistry-informed networks could matter. There may be other problems in computational chemistry where these methods offer an advantage over both purely data-driven models and classical iterative solvers.
I’d be happy to discuss these ideas further and collaborate on training chemistry-informed neural networks and operators. I hope to meet you all along the journey!