- Search
- Download PDF

Consider the energy, $E({\mathbf{x}}_{0})$ at some point ${\mathbf{x}}_{0}$ on a potential energy surface. We can express the energy at a nearby point $\mathbf{x}={\mathbf{x}}_{0}+\mathbf{h}$ by means of the Taylor series

$$E({\mathbf{x}}_{0}+\mathbf{h})=E({\mathbf{x}}_{0})+{\mathbf{h}}^{t}\left(\frac{dE({\mathbf{x}}_{0})}{d\mathbf{x}}\right)+\frac{1}{2}{\mathbf{h}}^{t}\left(\frac{{d}^{2}E({\mathbf{x}}_{0})}{d{\mathbf{x}}_{1}d{\mathbf{x}}_{2}}\right)\mathbf{h}+\mathrm{\cdots}$$ | (A.1) |

If we knew the exact form of the energy functional $E(\mathbf{x})$ and all its
derivatives, we could move from the current point ${\mathbf{x}}_{0}$ directly to a
stationary point, (*i.e.*, we would know exactly what the step $\mathbf{h}$ ought to
be). Since we typically know only the lower derivatives of $E(\mathbf{x})$ at best,
then we can estimate the step $\mathbf{h}$ by differentiating the Taylor series with
respect to $\mathbf{h}$, keeping only the first few terms on the right hand side,
and setting the left hand side, $dE({\mathbf{x}}_{0}+\mathbf{h})/d\mathbf{h}$, to zero, which
is the value it would have at a genuine stationary point. Thus

$$\frac{dE({\mathbf{x}}_{0}+\mathbf{h})}{d\mathbf{h}}=\frac{dE({\mathbf{x}}_{0})}{d\mathbf{x}}+\left(\frac{{d}^{2}E({\mathbf{x}}_{0})}{d{\mathbf{x}}_{1}d{\mathbf{x}}_{2}}\right)\mathbf{h}+\text{higher terms (ignored)}$$ | (A.2) |

from which

$$\mathbf{h}=-{\mathbf{H}}^{-1}\mathbf{g}$$ | (A.3) |

where

$$\frac{dE}{d\mathbf{x}}\equiv \mathbf{g}\text{(gradient vector),}\mathit{\hspace{1em}\hspace{1em}}\frac{{d}^{2}E}{d{\mathbf{x}}_{\mathrm{\U0001d7cf}}d{\mathbf{x}}_{\mathrm{\U0001d7d0}}}\equiv \mathbf{H}\text{(Hessian matrix)}$$ | (A.4) |

Equation (A.3) is known as the Newton-Raphson step. It is the major component of almost all geometry optimization algorithms in quantum chemistry.

The above derivation assumed exact first (gradient) and second (Hessian) derivative information. Analytical gradients are available for all methodologies supported in Q-Chem; however analytical second derivatives are not. Furthermore, even if they were, it would not necessarily be advantageous to use them as their evaluation is usually computationally demanding, and, efficient optimizations can in fact be performed without an exact Hessian. An excellent compromise in practice is to begin with an approximate Hessian matrix, and update this using gradient and displacement information generated as the optimization progresses. In this way the starting Hessian can be “improved” at essentially no cost. Using Eq. (A.3) with an approximate Hessian is called the quasi Newton-Raphson step.

The nature of the Hessian matrix (in particular its eigenvalue structure) plays
a crucial role in a successful optimization. All stationary points on a
potential energy surface have a zero gradient vector; however the character of
the stationary point (*i.e.*, what type of structure it corresponds to) is
determined by the Hessian. Diagonalization of the Hessian matrix can be
considered to define a set of mutually orthogonal directions on the energy
surface (the eigenvectors) together with the curvature along those directions
(the eigenvalues). At a local minimum (corresponding to a well in the potential
energy surface) the curvature along all of these directions must be positive,
reflecting the fact that a small displacement along any of these directions
causes the energy to rise. At a transition state, the curvature is negative
(*i.e.*, the energy is a maximum) along one direction, but positive along all
the others. Thus, for a stationary point to be a transition state the Hessian
matrix at that point must have one and only one negative eigenvalue, while for
a minimum the Hessian must have all positive eigenvalues. In the latter case
the Hessian is called *positive definite*. If searching for a minimum it
is important that the Hessian matrix be positive definite; in fact, unless the
Hessian is positive definite there is no guarantee that the step predicted by
Eq. (A.3) is even a descent step (*i.e.*, a direction that will actually
lower the energy). Similarly, for a transition state search, the Hessian must
have one negative eigenvalue. Maintaining the Hessian eigenvalue structure is
not difficult for minimization, but it can be a difficulty when trying to find
a transition state.

In a diagonal Hessian representation the Newton-Raphson step can be written

$$\mathbf{h}=-\sum _{i}\left(\frac{{F}_{i}}{{b}_{i}}\right){\mathbf{u}}_{i}$$ | (A.5) |

where ${\mathbf{u}}_{i}$ and ${b}_{i}$ are the eigenvectors and eigenvalues of the Hessian
matrix $\mathbf{H}$ and ${F}_{i}={\mathbf{u}}_{i}^{t}\mathbf{g}$ is the component of $\mathbf{g}$ along the
local direction (eigenmode)${\mathbf{u}}_{i}$. As discussed by Simons *et al.*,^{880}
the Newton-Raphson step can be considered as minimizing
along directions ${\mathbf{u}}_{i}$ which have positive eigenvalues and maximizing along
directions with negative eigenvalues. Thus, if the user is searching for a
minimum and the Hessian matrix is positive definite, then the Newton-Raphson
step is appropriate since it is attempting to minimize along all directions
simultaneously. However, if the Hessian has one or more negative eigenvalues,
then the basic Newton-Raphson step is not appropriate for a minimum search,
since it will be maximizing and not minimizing along one or more directions.
Exactly the same arguments apply during a transition state search except that
the Hessian must have one negative eigenvalue, because the user has to maximize
along one direction. However, there must be *only* one negative
eigenvalue. A positive definite Hessian is a disaster for a transition state
search because the Newton-Raphson step will then lead towards a minimum.

If firmly in a region of the potential energy surface with the right Hessian character, then a careful search (based on the Newton-Raphson step) will almost always lead to a stationary point of the correct type. However, this is only true if the Hessian is exact. If an approximate Hessian is being improved by updating, then there is no guarantee that the Hessian eigenvalue structure will be retained from one cycle to the next unless one is very careful during the update. Updating procedures that “guarantee” conservation of a positive definite Hessian do exist (or at least warn the user if the update is likely to introduce negative eigenvalues). This can be very useful during a minimum search; but there are no such guarantees for preserving the Hessian character (one and only one negative eigenvalue) required for a transition state.

In addition to the difficulties in retaining the correct Hessian character,
there is the matter of obtaining a “correct” Hessian in the first instance.
This is particularly acute for a transition state search. For a minimum search
it is possible to “guess” a reasonable, positive-definite starting Hessian (for
example, by carrying out a molecular mechanics minimization initially and using
the mechanics Hessian to begin the *ab initio* optimization) but this
option is usually not available for transition states. Even if the user
calculates the Hessian exactly at the starting geometry, the guess for the
structure may not be sufficiently accurate, and the expensive, exact Hessian
may not have the desired eigenvalue structure.

Consequently, particularly for a transition state search, an alternative to the basic Newton-Raphson step is clearly needed, especially when the Hessian matrix is inappropriate for the stationary point being sought.

One of the first algorithms that was capable of taking corrective action during
a transition state search if the Hessian had the wrong eigenvalue structure,
was developed by Poppinger,^{779} who suggested that, instead
of taking the Newton-Raphson step, if the Hessian had all positive
eigenvalues, the lowest Hessian mode be followed uphill; whereas, if there were
two or more negative eigenvalues, the mode corresponding to the least negative
eigenvalue be followed downhill. While this step should lead the user back into
the right region of the energy surface, it has the disadvantage that the user
is maximizing or minimizing along one mode only, unlike the Newton-Raphson
step which maximizes/minimizes along all modes simultaneously. Another drawback
is that successive such steps tend to become linearly dependent, which degrades
most of the commonly used Hessian updates.