A.5 Constrained Optimization

Constrained optimization refers to the optimization of molecular structures in which certain parameters (e.g., bond lengths, bond angles or dihedral angles) are fixed. In quantum chemistry calculations, this has traditionally been accomplished using Z-matrix coordinates, with the desired parameter set in the Z-matrix and simply omitted from the optimization space. In 1992, Baker presented an algorithm for constrained optimization directly in Cartesian coordinates [902]. Baker’s algorithm used both penalty functions and the classical method of Lagrange multipliers [909], and was developed in order to impose constraints on a molecule obtained from a graphical model builder as a set of Cartesian coordinates. Some improvements widening the range of constraints that could be handled were made in 1993 [903]. Q-Chem includes the latest version of this algorithm, which has been modified to handle constraints directly in delocalized internal coordinates [904].

The essential problem in constrained optimization is to minimize a function of, for example, $n$ variables $F(\ensuremath{\mathbf{x}})$ subject to a series of m constraints of the form $C_ i(\ensuremath{\mathbf{x}}) = 0$ , $i=l$ . $m$ . Assuming $m<n$ , then perhaps the best way to proceed (if this were possible in practice) would be to use the $m$ constraint equations to eliminate $m$ of the variables, and then solve the resulting unconstrained problem in terms of the ( $(n-m)$ independent variables. This is exactly what occurs in a Z-matrix optimization. Such an approach cannot be used in Cartesian coordinates as standard distance and angle constraints are non-linear functions of the appropriate coordinates. For example a distance constraint (between atoms $i$ and $j$ in a molecule) is given in Cartesians by $(R_{ij} - R_0) = 0$ , with

$\begin{equation} R_{ij} =\sqrt {\left( {x_ i -x_ j } \right)^2+\left( {y_ i -y_ j } \right)^2+\left( {z_ i -z_ j } \right)^2} \end{equation}$

(A.17)

and $R_0$ the constrained distance. This obviously cannot be satisfied by elimination. What can be eliminated in Cartesians are the individual $x$ , $y$ and $z$ coordinates themselves and in this way individual atoms can be totally or partially frozen.

Internal constraints can be handled in Cartesian coordinates by introducing the Lagrangian function

$\begin{equation} \label{eq:a15} L(\ensuremath{\mathbf{x}},\lambda ) = F(\ensuremath{\mathbf{x}})-\sum \limits _{i=1}^ m {\lambda _ i C_ i(\ensuremath{\mathbf{x}})} \end{equation}$

(A.18)

which replaces the function $F(\ensuremath{\mathbf{x}})$ in the unconstrained case. Here, the $\lambda _ i$ are the so-called Lagrange multipliers, one for each constraint $C_ i(\ensuremath{\mathbf{x}})$ . Differentiating Eq. (A.18) with respect to $\ensuremath{\mathbf{x}}$ and $\lambda$ affords

	$\displaystyle \label{eq:a16a} \frac{dL(\ensuremath{\mathbf{x}},\lambda )}{dx_ j}$	$\displaystyle =$	$\displaystyle \frac{dF(\ensuremath{\mathbf{x}})}{dx_ j} -\sum \limits _{i=1}^ m \lambda _ i \frac{dC_ i(\ensuremath{\mathbf{x}})}{dx_ j}$		(A.19)
	$\displaystyle \frac{dL(\ensuremath{\mathbf{x}},\lambda )}{d\lambda _ i}$	$\displaystyle =$	$\displaystyle -C_ i (\ensuremath{\mathbf{x}})$		(A.20)

At a stationary point of the Lagrangian we have $\nabla \ensuremath{\mathbf{L}} = 0$ , i.e., all $dL/dx_ j = 0$ and all $dL/d\lambda _ i = 0$ . This latter condition means that all $C_ i(\ensuremath{\mathbf{x}}) = 0$ and thus all constraints are satisfied. Hence, finding a set of values ( $\ensuremath{\mathbf{x}}$ , $\lambda$ ) for which $\nabla \ensuremath{\mathbf{L}} = 0$ will give a possible solution to the constrained optimization problem in exactly the same way as finding an $\ensuremath{\mathbf{x}}$ for which $\ensuremath{\mathbf{g}} = \nabla \ensuremath{\mathbf{F}} = 0$ gives a solution to the corresponding unconstrained problem.

The Lagrangian second derivative matrix, which is the analogue of the Hessian matrix in an unconstrained optimization, is given by

$\begin{equation} \label{eq:a17} \nabla ^2 \ensuremath{\mathbf{L}} = \left( \begin{array}{cc} \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{dx_ j dx_ k} & \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{dx_ j d\lambda _ i} \\ \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{dx_ j d\lambda _ i} & \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{d\lambda _ j d\lambda _ i} \end{array} \right) \end{equation}$

(A.21)

where

$\displaystyle \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{dx_ j dx_ k}$	$\displaystyle =$	$\displaystyle \frac{d^2F(\ensuremath{\mathbf{x}})}{dx_ j dx_ k} -\sum \limits _{i=1}^ m \lambda _ i \frac{d^2C_ i(\ensuremath{\mathbf{x}})}{dx_ j dx_ k}$	(A.22)
$\displaystyle \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{dx_ j d\lambda _ i}$	$\displaystyle =$	$\displaystyle \frac{-dC_ i(\ensuremath{\mathbf{x}})}{dx_ j}$	(A.23)
$\displaystyle \frac{d^2L(\ensuremath{\mathbf{x}},\lambda )}{d\lambda _ j d\lambda _ i}$	$\displaystyle =$	$\displaystyle 0$	(A.24)

Thus, in addition to the standard gradient vector and Hessian matrix for the unconstrained function $F(\ensuremath{\mathbf{x}})$ , we need both the first and second derivatives (with respect to coordinate displacement) of the constraint functions. Once these quantities are available, the corresponding Lagrangian gradient, given by Eq. (A.19), and Lagrangian second derivative matrix, given by Eq. (A.21), can be formed, and the optimization step calculated in a similar manner to that for a standard unconstrained optimization [902].

In the Lagrange multiplier method, the unknown multipliers, $\lambda _ i$ , are an integral part of the parameter set. This means that the optimization space consists of all $n$ variables $\ensuremath{\mathbf{x}}$ plus all $m$ Lagrange multipliers $\lambda$ , one for each constraint. The total dimension of the constrained optimization problem, $nm$ , has thus increased by $m$ compared to the corresponding unconstrained case. The Lagrangian Hessian matrix, $\nabla ^{2}\ensuremath{\mathbf{L}}$ , has $m$ extra modes compared to the standard (unconstrained) Hessian matrix, $\nabla ^2\ensuremath{\mathbf{F}}$ . What normally happens is that these additional modes are dominated by the constraints (i.e., their largest components correspond to the constraint Lagrange multipliers) and they have negative curvature (a negative Hessian eigenvalue). This is perhaps not surprising when one realizes that any motion in the parameter space that breaks the constraints is likely to lower the energy.

Compared to a standard unconstrained minimization, where a stationary point is sought at which the Hessian matrix has all positive eigenvalues, in the constrained problem we are looking for a stationary point of the Lagrangian function at which the Lagrangian Hessian matrix has as many negative eigenvalues as there are constraints (i.e., we are looking for an $m$ th-order saddle point). For further details and practical applications of constrained optimization using Lagrange multipliers in Cartesian coordinates, see [902].

Eigenvector following can be implemented in a constrained optimization in a similar way to the unconstrained case. Considering a constrained minimization with $m$ constraints, then Eq. (A.11) is replaced by

$\begin{equation} \label{eq:a18a} \left( {{\begin{array}{*{20}c} {b_1 } \hfill & \hfill & \hfill & {F_1 } \hfill \\ \hfill & \ddots \hfill & {\rm {\bf 0}} \hfill & \vdots \hfill \\ \hfill & {\rm {\bf 0}} \hfill & {b_ m } \hfill & {F_ m } \hfill \\ {F_1 } \hfill & \cdots \hfill & {F_ m } \hfill & 0 \hfill \\ \end{array} }} \right)\left( {{\begin{array}{*{20}c} {h_1 } \hfill \\ \vdots \hfill \\ {h_ m } \hfill \\ 1 \hfill \\ \end{array} }} \right)=\lambda _ p \left( {{\begin{array}{*{20}c} {h_1 } \hfill \\ \vdots \hfill \\ {h_ m } \hfill \\ 1 \hfill \\ \end{array} }} \right) \end{equation}$

(A.25)

and Eq. (A.12) by

$\begin{equation} \label{eq:a18b} \left( {{\begin{array}{*{20}c} {b_{m+1} } \hfill & \hfill & \hfill & {F_{m+1} } \hfill \\ \hfill & \ddots \hfill & {\rm {\bf 0}} \hfill & \vdots \hfill \\ \hfill & {\rm {\bf 0}} \hfill & {b_{m+n} } \hfill & {F_{m+n} } \hfill \\ {F_{m+1} } \hfill & \cdots \hfill & {F_{m+n} } \hfill & 0 \hfill \\ \end{array} }} \right)\left( {{\begin{array}{*{20}c} {h_{m+1} } \hfill \\ \vdots \hfill \\ {h_{m+n} } \hfill \\ 1 \hfill \\ \end{array} }} \right)=\lambda _ n \left( {{\begin{array}{*{20}c} {h_{m+1} } \hfill \\ \vdots \hfill \\ {h_{m+n} } \hfill \\ 1 \hfill \\ \end{array} }} \right) \end{equation}$

(A.26)

where now the $b_ i$ are the eigenvalues of $\nabla ^2\ensuremath{\mathbf{L}}$ , with corresponding eigenvectors $\ensuremath{\mathbf{u}}_ i$ , and $F_ i~ =~ \ensuremath{\mathbf{u}}_ i^ t\nabla \ensuremath{\mathbf{L}}$ . Here Eq. (A.25) includes the $m$ constraint modes along which a negative Lagrangian Hessian eigenvalue is required, and Eq. (A.26) includes all the other modes.

Equations (A.25) and (A.26) implement eigenvector following for a constrained minimization. Constrained transition state searches can be carried out by selecting one extra mode to be maximized in addition to the $m$ constraint modes, i.e., by searching for a saddle point of the Lagrangian function of order $m+l$ .

It should be realized that, in the Lagrange multiplier method, the desired constraints are only satisfied at convergence, and not necessarily at intermediate geometries. The Lagrange multipliers are part of the optimization space; they vary just as any other geometrical parameter and, consequently the degree to which the constraints are satisfied changes from cycle to cycle, approaching 100% satisfied near convergence. One advantage this brings is that, unlike in a standard Z-matrix approach, constraints do not have to be satisfied in the starting geometry.

Imposed constraints can normally be satisfied to very high accuracy, $10^{-6}$ or better. However, problems can arise for both bond and dihedral angle constraints near $0^{\circ }$ and $180^{\circ }$ and, instead of attempting to impose a single constraint, it is better to split angle constraints near these limiting values into two by using a dummy atom Baker:1993b, exactly analogous to splitting a $180^{\circ }$ bond angle into two $90^{\circ }$ angles in a Z-matrix.

Note: Exact $0^{\circ }$ and $180^{\circ }$ single angle constraints cannot be imposed, as the corresponding constraint normals, $\nabla \ensuremath{\mathbf{C}}_ i$ , are zero, and would result in rows and columns of zeros in the Lagrangian Hessian matrix.