- Search
- Download PDF

Constrained optimization refers to the optimization of molecular structures in
which certain parameters (*e.g.*, bond lengths, bond angles or dihedral angles)
are fixed. In quantum chemistry calculations, this has traditionally been
accomplished using *Z*-matrix coordinates, with the desired parameter set in
the *Z*-matrix and simply omitted from the optimization space. In 1992, Baker
presented an algorithm for constrained optimization directly in Cartesian
coordinates.^{33} Baker’s algorithm used both penalty functions
and the classical method of Lagrange multipliers,^{258} and was
developed in order to impose constraints on a molecule obtained from a
graphical model builder as a set of Cartesian coordinates. Some improvements
widening the range of constraints that could be handled were made in 1993.^{29}
Q-Chem includes the latest version of this algorithm,
which has been modified to handle constraints directly in delocalized internal
coordinates.^{34}

The essential problem in constrained optimization is to minimize a function of,
for example, $n$ variables $F(\mathbf{x})$ subject to a series of m constraints of
the form ${C}_{i}(\mathbf{x})=0$ fpr $i=\mathrm{\ell},\mathrm{\dots},m$. Assuming $$, then perhaps the best
way to proceed (if this were possible in practice) would be to use the $m$
constraint equations to eliminate $m$ of the variables, and then solve the
resulting unconstrained problem in terms of the $n-m$ independent variables.
This is exactly what occurs in a *Z*-matrix optimization. Such an approach
cannot be used in Cartesian coordinates as standard distance and angle
constraints are non-linear functions of the appropriate coordinates. For
example a distance constraint (between atoms $i$ and $j$ in a molecule) is
given in Cartesians by $({R}_{ij}-{R}_{0})=0$, with

$${R}_{ij}={\left[{({x}_{i}-{x}_{j})}^{2}+{({y}_{i}-{y}_{j})}^{2}+{({z}_{i}-{z}_{j})}^{2}\right]}^{1/2}$$ | (A.17) |

and ${R}_{0}$ the constrained distance. This obviously cannot be satisfied by elimination. What can be eliminated in Cartesians are the individual $x$, $y$ and $z$ coordinates themselves and in this way individual atoms can be totally or partially frozen.

Internal constraints can be handled in Cartesian coordinates by introducing the Lagrangian function

$$L(\mathbf{x},\lambda )=F(\mathbf{x})-\sum _{i=1}^{m}{\lambda}_{i}{C}_{i}(\mathbf{x})$$ | (A.18) |

which replaces the function $F(\mathbf{x})$ in the unconstrained case. Here, the ${\lambda}_{i}$ are the so-called Lagrange multipliers, one for each constraint ${C}_{i}(\mathbf{x})$. Differentiating Eq. (A.18) with respect to $\mathbf{x}$ and $\lambda $ affords

$\frac{dL(\mathbf{x},\lambda )}{d{x}_{j}}$ | $=$ | $\frac{dF(\mathbf{x})}{d{x}_{j}}}-{\displaystyle \sum _{i=1}^{m}}{\lambda}_{i}\left({\displaystyle \frac{d{C}_{i}(\mathbf{x})}{d{x}_{j}}}\right)$ | (A.19) | ||

$\frac{dL(\mathbf{x},\lambda )}{d{\lambda}_{i}}$ | $=$ | $-{C}_{i}(\mathbf{x})$ | (A.20) |

At a stationary point of the Lagrangian we have $\widehat{\mathbf{\nabla}}\mathbf{L}=\mathrm{\U0001d7ce}$, *i.e.*, all
$dL/d{x}_{j}=0$ and all $dL/d{\lambda}_{i}=0$. This latter condition means that all
${C}_{i}(\mathbf{x})=0$ and thus all constraints are satisfied. Hence, finding a set
of values ($\mathbf{x}$, $\lambda $) for which $\widehat{\mathbf{\nabla}}\mathbf{L}=\mathrm{\U0001d7ce}$ will give a
possible solution to the constrained optimization problem in exactly the same
way as finding an $\mathbf{x}$ for which $\mathbf{g}=\widehat{\mathbf{\nabla}}\mathbf{F}=\mathrm{\U0001d7ce}$ gives a solution
to the corresponding unconstrained problem.

The Lagrangian second derivative matrix, which is the analogue of the Hessian matrix in an unconstrained optimization, is given by

$${\widehat{\nabla}}^{2}\mathbf{L}=\left(\begin{array}{cc}\hfill \frac{{d}^{2}L(\mathbf{x},\lambda )}{d{x}_{j}d{x}_{k}}\hfill & \hfill \frac{{d}^{2}L(\mathbf{x},\lambda )}{d{x}_{j}d{\lambda}_{i}}\hfill \\ \hfill \frac{{d}^{2}L(\mathbf{x},\lambda )}{d{x}_{j}d{\lambda}_{i}}\hfill & \hfill \frac{{d}^{2}L(\mathbf{x},\lambda )}{d{\lambda}_{j}d{\lambda}_{i}}\hfill \end{array}\right)$$ | (A.21) |

where

$\frac{{d}^{2}L(\mathbf{x},\lambda )}{d{x}_{j}d{x}_{k}}$ | $={\displaystyle \frac{{d}^{2}F(\mathbf{x})}{d{x}_{j}d{x}_{k}}}-{\displaystyle \sum _{i=1}^{m}}{\lambda}_{i}\left({\displaystyle \frac{{d}^{2}{C}_{i}(\mathbf{x})}{d{x}_{j}d{x}_{k}}}\right)$ | (A.22) | ||

$\frac{{d}^{2}L(\mathbf{x},\lambda )}{d{x}_{j}d{\lambda}_{i}}$ | $=-\left({\displaystyle \frac{d{C}_{i}(\mathbf{x})}{d{x}_{j}}}\right)$ | (A.23) | ||

$\frac{{d}^{2}L(\mathbf{x},\lambda )}{d{\lambda}_{j}d{\lambda}_{i}}$ | $=0.$ | (A.24) |

Thus, in addition to the standard gradient vector and Hessian matrix for the
unconstrained function $F(\mathbf{x})$, we need both the first and second
derivatives (with respect to coordinate displacement) of the constraint
functions. Once these quantities are available, the corresponding Lagrangian
gradient, given by Eq. (A.19), and Lagrangian second derivative matrix, given by
Eq. (A.21), can be formed, and the optimization step calculated in a similar
manner to that for a standard unconstrained optimization.^{33}

In the Lagrange multiplier method, the unknown multipliers, ${\lambda}_{i}$, are an
integral part of the parameter set. This means that the optimization space
consists of all $n$ variables $\mathbf{x}$ plus all $m$ Lagrange multipliers
$\lambda $, one for each constraint. The total dimension of the constrained
optimization problem, $nm$, has thus increased by $m$ compared to the
corresponding unconstrained case. The Lagrangian Hessian matrix,
${\widehat{\nabla}}^{2}\mathbf{L}$, has $m$ extra modes compared to the standard (unconstrained)
Hessian matrix, ${\widehat{\nabla}}^{2}\mathbf{F}$. What normally happens is that these additional
modes are dominated by the constraints (*i.e.*, their largest components
correspond to the constraint Lagrange multipliers) and they have negative
curvature (a negative Hessian eigenvalue). This is perhaps not surprising when
one realizes that any motion in the parameter space that breaks the constraints
is likely to lower the energy.

Compared to a standard unconstrained minimization, where a stationary point is
sought at which the Hessian matrix has all positive eigenvalues, in the
constrained problem we are looking for a stationary point of the Lagrangian
function at which the Lagrangian Hessian matrix has as many negative
eigenvalues as there are constraints (*i.e.*, we are looking for an $m$th-order
saddle point). For further details and practical applications of
constrained optimization using Lagrange multipliers in Cartesian coordinates;
see Ref. 33.

Eigenvector following can be implemented in a constrained optimization in a similar way to the unconstrained case. Considering a constrained minimization with $m$ constraints, then Eq. (A.11) is replaced by

$$\left(\begin{array}{cccccccccccccccccccc}\hfill {b}_{1}\hfill & & & \hfill {F}_{1}\hfill & & & & & & & & & & & & & & & & \\ & \hfill \mathrm{\ddots}\hfill & \hfill \mathrm{\U0001d7ce}\hfill & \hfill \mathrm{\vdots}\hfill & & & & & & & & & & & & & & & & \\ & \hfill \mathrm{\U0001d7ce}\hfill & \hfill {b}_{m}\hfill & \hfill {F}_{m}\hfill & & & & & & & & & & & & & & & & \\ \hfill {F}_{1}\hfill & \hfill \mathrm{\cdots}\hfill & \hfill {F}_{m}\hfill & \hfill 0\hfill & & & & & & & & & & & & & & & & \end{array}\right)\left(\begin{array}{cccccccccccccccccccc}\hfill {h}_{1}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill \mathrm{\vdots}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill {h}_{m}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill 1\hfill & & & & & & & & & & & & & & & & & & & \end{array}\right)={\lambda}_{p}\left(\begin{array}{cccccccccccccccccccc}\hfill {h}_{1}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill \mathrm{\vdots}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill {h}_{m}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill 1\hfill & & & & & & & & & & & & & & & & & & & \end{array}\right)$$ | (A.25) |

and Eq. (A.12) by

$$\left(\begin{array}{cccccccccccccccccccc}\hfill {b}_{m+1}\hfill & & & \hfill {F}_{m+1}\hfill & & & & & & & & & & & & & & & & \\ & \hfill \mathrm{\ddots}\hfill & \hfill \mathrm{\U0001d7ce}\hfill & \hfill \mathrm{\vdots}\hfill & & & & & & & & & & & & & & & & \\ & \hfill \mathrm{\U0001d7ce}\hfill & \hfill {b}_{m+n}\hfill & \hfill {F}_{m+n}\hfill & & & & & & & & & & & & & & & & \\ \hfill {F}_{m+1}\hfill & \hfill \mathrm{\cdots}\hfill & \hfill {F}_{m+n}\hfill & \hfill 0\hfill & & & & & & & & & & & & & & & & \end{array}\right)\left(\begin{array}{cccccccccccccccccccc}\hfill {h}_{m+1}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill \mathrm{\vdots}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill {h}_{m+n}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill 1\hfill & & & & & & & & & & & & & & & & & & & \end{array}\right)={\lambda}_{n}\left(\begin{array}{cccccccccccccccccccc}\hfill {h}_{m+1}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill \mathrm{\vdots}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill {h}_{m+n}\hfill & & & & & & & & & & & & & & & & & & & \\ \hfill 1\hfill & & & & & & & & & & & & & & & & & & & \end{array}\right)$$ | (A.26) |

where now the ${b}_{i}$ are the eigenvalues of ${\widehat{\nabla}}^{2}\mathbf{L}$, with corresponding eigenvectors ${\mathbf{u}}_{i}$, and ${F}_{i}={\mathbf{u}}_{i}^{t}\widehat{\nabla}\mathbf{L}$. Here Eq. (A.25) includes the $m$ constraint modes along which a negative Lagrangian Hessian eigenvalue is required, and Eq. (A.26) includes all the other modes.

Equations (A.25) and (A.26) implement eigenvector following
for a constrained minimization. Constrained transition state searches can be
carried out by selecting one extra mode to be maximized in addition to the $m$
constraint modes, *i.e.*, by searching for a saddle point of the Lagrangian
function of order $m+\mathrm{\ell}$.

It should be realized that, in the Lagrange multiplier method, the desired
constraints are only satisfied at convergence, and not necessarily at
intermediate geometries. The Lagrange multipliers are part of the optimization
space; they vary just as any other geometrical parameter and, consequently the
degree to which the constraints are satisfied changes from cycle to cycle,
approaching 100% satisfied near convergence. One advantage this brings is
that, unlike in a standard *Z*-matrix approach, constraints do not have to be
satisfied in the starting geometry.

Imposed constraints can normally be satisfied to very high accuracy, ${10}^{-6}$
or better. However, problems can arise for both bond and dihedral angle
constraints near ${0}^{\circ}$ and ${180}^{\circ}$ and, instead of attempting to
impose a single constraint, it is better to split angle constraints near these
limiting values into two by using a dummy atom,^{29} exactly
analogous to splitting a ${180}^{\circ}$ bond angle into two ${90}^{\circ}$ angles
in a *Z*-matrix.

Note: Exact ${\mathrm{0}}^{\mathrm{\circ}}$ and ${\mathrm{180}}^{\mathrm{\circ}}$ single angle constraints cannot be imposed, as the corresponding constraint normals, $\widehat{\mathbf{\nabla}}\mathbf{}{\mathbf{C}}_{i}$, are zero, and would result in rows and columns of zeros in the Lagrangian Hessian matrix.