Gaussian 03 Online Manual
Last update: 4 April 2003

Efficiency Considerations

Gaussian has been designed to work efficiently given a variety of computer configurations. In general, the program attempts to select the most efficient algorithm given the memory and disk constraints imposed upon it. Since Gaussian does offer a wide choice of algorithms, an understanding of the possibilities and tradeoffs can help you to achieve optimal performance.

Before proceeding, however, let us emphasize two very important points:

  • The default algorithms selected by the program give good performance for all but very large jobs. Note that some defaults have changed with Gaussian 03 to reflect current typical problem sizes. Defaults used in earlier versions of the program were designed for small jobs of under 100 basis functions. The default algorithms used in Gaussian are generally designed for longer jobs.

  • For users or sites who routinely run very large jobs, the following defaults placed in the Default.Route file will produce good general performance:

     -M- available-memory
     # MaxDisk=available-disk
  • where the amount of available memory and disk are specified as indicated; the default units for each are 8-byte words, and either value may be followed by KB, MB, GB, KW, MW or GW (without intervening spaces) to specify units of kilo-, mega- or giga- bytes or words. Once the Default.Route file is set up, for many sites, no other special actions are required for overall efficient program use. The default memory size is 6MW.

Estimating Calculation Memory Requirements

The following formula can be used to estimate the memory requirement of various types of Gaussian jobs (in 8-byte words):

M + 2NB2

where NB is the number of basis functions used in the calculation, and M is a minimum value that depends on the job type, given in the following table:

  Highest Angular Momentum Basis Function
Job Type f functions g functions h functions i functions j functions
SCF Energies 4 MW 4 MW 9 MW 23 MW ~60 MW
SCF Gradients 4 MW 5 MW 16 MW 38 MW
SCF Frequencies 4 MW 9 MW 27 MW
MP2 Energies 4 MW 5 MW 10 MW 28 MW ~70 MW
MP2 Gradients 4 MW 6 MW 16 MW 38 MW
MP2 Frequencies 6 MW 10 MW 28 MW

For example, on a 32-bit system, a 300 basis function HF geometry optimization using g functions would require about 5.2 MW (~42 MB) of memory.

Note that 1 MW = 1,048,576 words (= 8,388,608 bytes). The values in the table are for 32-bit computer systems; they would need to be doubled for 64-bit systems. They also reflect the use of uncontracted higher angular momentum functions-f and above-which is the default type. Larger amounts of memory may be required for derivatives of contracted high angular momentum functions.

The remainder of this chapter is designed for users who wish to understand more about the tradeoffs inherent in the various choices in order to obtain optimal performance for an individual job, not just good overall performance. Techniques for both very large and small jobs will be covered. Additional, related information may be found in reference [572].

Memory Requirements for Parallel Calculations

When using multiple processors with shared memory, a good estimate of the memory required is the amount of memory from the preceding table for each processor. Thus, if the value from the table is 10 MW and you want to use four shared memory processors, set %Mem to be at least 40 MW.

For distributed memory calculations (i.e., those performed via Linda), the amount of memory specified in %Mem should be equal to or greater than the value from the preceding table.

In Gaussian 03, these two parallelization methods can be combined. For example, you would use the following directive in order to run a job on 8 CPUs located on four two-headed shared memory multiprocessors (assuming that the memory value from the table is 10 MW):

%Mem=20MW                Memory required by each multiprocessor.
%NProcLinda=4            Use four Linda workers (one per multiprocessor).
%NProcShared=2           Use two shared memory processors on each multiprocessor computer.

Storage, Transformation, and Recomputation of Integrals

One of the most important performance-related choices is the way in which the program processes the numerous electron repulsion integrals. There are five possible approaches to handling two-electron repulsion integrals implemented in Gaussian:

AO
The two-electron integrals over the atomic orbitals (AO integrals) are generated once and stored externally on disk. This is the approach used by conventional SCF calculations.

MO
The AO integrals are generated once and stored externally, then transformed to the molecular orbital basis. The transformed (MO) integrals are also stored externally. This is the approach used by earlier versions of Gaussian for all correlated energy methods.

Direct
The AO integrals (and possibly integral derivatives) are recomputed as needed. This does not require O(N4) internal or external storage, but does potentially involve additional computational effort. In some cases, other savings are possible that compensate for this additional effort. In any case, direct methods are the only choice when memory and disk are exhausted and consequently are inevitably used for the largest calculations. In contrast to earlier versions of the program, the direct method is the default for SCF calculations in Gaussian .

Semi-Direct
The AO integrals (and possibly integral derivatives) are recomputed as needed. In addition, MO quantities are stored temporarily on disk in whatever size chunks fit in the available disk space.

In-Core
The AO integrals are generated once and stored in canonical order in main memory (i.e., including zeroes). This requires large amounts of memory, but allows the integrals to be processed using simple matrix operations and no I/O, and consequently is very fast.

At least two of these approaches are available for all methods in Gaussian. The default method for a given job is chosen to give good performance on small to medium sized molecules. The various options and tradeoffs for each method are described in the following sections.

SCF Energies and Gradients

The performance issues that arise for SCF calculations include how the integrals are to be handled, and which alternative calculation method to select in the event that the default procedure fails to converge.

Integral Storage

By default, SCF calculations use the direct algorithm. It might seem that direct SCF would be preferred only when disk space is insufficient. However, this is not the case in practice. Because of the use of cutoffs, the cost of direct SCF scales with molecular size as N2.7 or better, while conventional SCF scales in practice as N3.5 [572]. Consequently, a point is reached fairly quickly where recomputing the integrals (really, only those integrals that are needed) actually consumes less CPU time than relying on external storage. Where this crossover occurs depends on how fast the integral evaluation in direct SCF is, and it varies from machine to machine. However, on modern computer systems, the most efficient strategy is to do an in-core SCF as long as it is feasible, and use the direct algorithm from that point on; the conventional algorithm is virtually never a good choice on such systems.

The change to direct SCF as the default algorithm in Gaussian 98 was made in consideration of these facts. SCF=Conven keyword is only needed on small memory computer systems like obsolete PCs.

In-core SCF is also available. Direct SCF calculations that have enough memory to store the integrals are automatically converted to in-core runs. SCF=InCore can be requested explicitly, in which case the job will be terminated if insufficient memory is available to store the integrals. Generally, about N4/8 + 500,000 words of memory are necessary for closed-shell in-core SCF, and N4/4 + 500,000 words for UHF or ROHF in-core SCF. This corresponds to about 100 MB for a 100 basis function job, 1.6 GB for a 200 basis function job, and 8.1 GB for a 300 basis function job (closed-shell).

GVB and MCSCF calculations can also be done using direct or in-core algorithms [405]. Memory requirements are similar to the open-shell Hartree-Fock case described above. The primary difference is that many Fock operators must be formed in each iteration. For GVB, there are 2Norb operators, where Norb is the number of orbitals in GVB pairs. For MCSCF, there are Nactive(Nactive-1)/2 + 1 operators, where Nactive is the number of orbitals in the active space. Consequently:

  • Cutoffs are less effective than for Hartree-Fock, so the crossover in efficiency is at a larger number of basis functions.

  • The number of operators can be quite large for larger MCSCF active spaces, so performance can be improved by ensuring that enough memory is available to hold all the density and operator matrices at once. Otherwise, the integrals will be evaluated more than once per iteration.

Direct SCF Procedure

In order to speed up direct HF calculations, the iterations are done in two phases:

  • The density is converged to about 10-5 using integrals accurate to six digits and a modest integration grid in DFT calculations. This step is terminated after 21 iterations even if it is not fully converged. This step is omitted by default if any transition metal atoms are present.

  • The density is then converged to 10-8 using integrals accurate to ten digits, allowing up to a total of 64 cycles total for the two steps.

This approach is substantially faster than using full integral accuracy throughout without slowing convergence in all cases tested so far. In the event of difficulties, full accuracy of the integrals throughout can be requested using SCF=NoVarAcc, at the expense of additional CPU time. See the discussion of the SCF keyword for more details.

Single-Point Direct SCF Convergence

In order to improve performance for single-point direct and in-core SCF calculations, a modification of the default SCF approach is used:

  • The integrals are done to only 10-6 accuracy except for all-electron (non-ECP) calculations involving molecules containing atoms heavier than argon.

  • The SCF is converged to either 10-4 on both the energy and density, or to 10-5 on the energy, whichever comes first.

This is sufficient accuracy for the usual uses of single-point SCF calculations, including relative energies, population analysis, multipole moments, electrostatic potentials, and electrostatic potential derived charges. Conventional SCF single points and all jobs other than single points use tight convergence of 10-8 on the density. The tighter convergence can be applied to single-point direct SCF by requesting SCF=Tight. See the discussion of the SCF keyword for more details.

Problem Convergence Cases

The default SCF algorithm now uses a combination of two Direct Inversion in the Iterative Subspace (DIIS) extrapolation methods EDIIS and CDIIS. EDIIS [559] uses energies for extrapolation, and it dominates the early iterations of the SCF convergence process. CDIIS, which performs extrapolation based on the commutators of the Fock and density matrices, handles the latter phases of SCF convergence. This new algorithm is very reliable, and previously troublesome SCF convergence cases now almost always converge with the default algorithm. For the few remaining pathological convergence cases, Gaussian 03 offers Fermi broadening and damping in combination with CDIIS (including automatic level shifting).

These are the available alternatives if the default approach fails to converge (labeled by their corresponding keyword):

SCF=Fermi
Requests temperature broadening during early iterations [562], combined with CDIIS and dynamic damping of early SCF iterations.

SCF=QC
This is quadratically convergent SCF, based on the method of Bacskay [563]. Since it combines linear minimizations with the Newton-Raphson algorithm suggested by Bacskay, it is guaranteed to reach a stationary point eventually. Typically, SCF=QC is about twice as expensive as conventional SCF. Since SCF=QC is reliable and can be used for direct SCF, it is usually the first choice if convergence problems are encountered. It can be used for RHF and UHF, but not for complex or ROHF.

Guess=Alter
Sometimes convergence difficulties are a warning that the initial guess has occupied the wrong orbitals. The guess should be examined, especially as to the symmetries of the occupied orbitals. Guess=Alter can be used to modify the orbitals selected for occupation.

SCF=NoDIIS
This implies conventional SCF using the old 3 and 4 point extrapolation. It is not usually a good choice for RHF and UHF, but is sometimes helpful for ROHF, for which there are fewer alternatives. More than 64 cycles may be needed for convergence.

SCF=(MaxCyc=N)
Increases the total number of SCF iterations to N. Note that merely increasing the number of SCF cycles for the default algorithm is rarely helpful.

SCF=DM
This is the older steepest descent algorithm of Seeger [564]. It can be used for complex HF, but not for direct HF or DFT.

These approaches all tend to force convergence to the closest stationary point in the orbital space, which may not be a minimum with respect to orbital rotations. A stability calculation can be used to verify that a proper SCF solution has been obtained (see the Stable keyword). Note also that you should verify that the final wavefunction corresponds to the desired electronic state, especially when using Guess=Alter.

SCF Frequencies

Four alternatives for integral processing are available for Hartree-Fock second derivatives:

Direct
The coupled perturbed Hartree-Fock (CPHF) equations are solved using integrals that are recomputed every iteration. Since cutoffs are not as effective for direct CPHF as for direct SCF, the crossover to direct being faster is higher, but even for 100 basis functions, direct frequencies are only about 40% slower than conventional (AO). Hence, as for direct SCF, the direct algorithm is preferred above 100 basis functions or so on vector machines and must be used when disk is exhausted on scalar machines. This algorithm is the default.

AO
The CPHF equations are solved using the written-out AO integrals. The petit (symmetry reduced) list can be used. This may be the optimal choice for jobs of up to about 100 basis functions.

MO
The CPHF equations are solved using transformed integrals. This is the only method used in most other electronic structure programs and is a bit faster than using the AO basis for small cases, but it is basically a waste of disk space. It is selected by specifying CPHF=MO in the route section. This option is not available for DFT methods.

In-Core
The integrals are stored in memory in canonical order. Memory requirements are the same as for in-core SCF. This is the fastest available method when it can be used. This algorithm is selected using SCF=InCore.

By default, during in-core frequencies, the integrals are computed once by each link that needs them. This keeps the disk storage down to the same modest amount as for direct-O(N3). If N4/8 disk is available, and in-core is being used only for speed, then specifying SCF=(InCore,Pass) will cause the integrals to be stored on disk (on the read-write file) after they are computed for the first time, and then read from disk rather than be recomputed by later steps.

HF frequency calculations include prediction of the infrared and Raman vibrational intensities by default. The IR intensities add negligible overhead to the calculation, but the Raman intensities add 10-20%. If the Raman intensities are not of interest, they can be suppressed by specifying Freq=NoRaman.

Freq=Raman produces Raman intensities by numerical differentiation for DFT and MP2 frequency calculations. Using this option does not change the calculation's disk requirements, but it will increase the CPU time for the job. Computing pre-resonance Raman intensities (with CPHF=RdFreq) will approximately double the job's CPU requirements.

While frequency calculations can be done using very modest amounts of memory, performance on very large jobs will be considerably better if enough memory is available to complete the major steps in one pass. Link 1110 must form a "skeleton derivative Fock matrix" for every degree of freedom (i.e., 3 x Number-of-atoms) and if only some of the matrices can be held in memory, it will compute the integral derivatives more than once. Similarly, in every iteration of the CPHF solutions, link 1002 must form updates to all the derivative Fock matrices. Link 1110 requires 3NAN2/2 words of memory, plus a constant amount for the integral derivatives to run optimally. Link 1002 requires 3NAN2 words, plus a constant amount, to run optimally.

The freqmem utility program returns the optimal memory size for different parameters of frequency calculation (i.e., the amount required to perform the major steps in a single pass).

MP2 Energies

Four algorithms are available for MP2, but most of the decision-making is done automatically by the program. The critical element of this decision making is the value of MaxDisk, which should be set according to your particular system configuration (see chapter 3). It indicates the maximum amount of disk space available in words. If no value is specified for MaxDisk, either in the route section or in the Default.Route file, Gaussian will assume that enough disk is available to perform the calculation with no redundant work, which may not be the case for larger runs. Thus, specifying the amount of available memory and disk is by far the most important way of optimizing performance for MP2 calculations. Doing so allows the program to decide between the various available algorithms, selecting the optimal one for your particular system configuration. This is best accomplished with -M- directive and MaxDisk keyword in the Default.Route file (although MaxDisk and %Mem may be included in the input file).

The algorithms available for MP2 energies are:

Semi-Direct
The AO integrals are generated as needed. The half-transformed integrals (ip|λσ) over one or more occupied orbitals i are sorted on disk. This method can function in as little as O(N2) memory and N3 disk and is usually the optimal choice. It is specified with MP2=SemiDirect.

In-Core
The AO integrals are generated once and stored in canonical order in memory. N4/4 memory is required. This is very fast if sufficient memory is available. This algorithm can be specified with MP2=InCore, which does in-core SCF and MP2.

FullDirect
The AO integrals are recomputed as needed during evaluation of E(2). No external storage is required. The number of integral evaluations depends on the amount of memory available. This is a good method only for machines with large amounts of physical memory. It is specified with MP2=FullDirect.

Conventional
The AO integrals are written out and transformed, then the MO integrals are antisymmetrized to produce E(2). This was the default algorithm in Gaussian 88 and earlier versions. The MP2=Conven keyword forces this conventional MP2 algorithm. While the new (semi-direct) algorithm can function well for very large N in modest memory, it does have a fixed minimum memory requirement of about one million words for basis sets containing only s, p, and d functions. The old code, which is slower on all machines, can be run in very small memory and may be needed on low-end machines.

Use=L903
The AO integrals are written out, then the transformation and formation of E(2) are done in memory. N3/2 memory is necessary. This was an option in Gaussian 88 if the energy but not the gradient was desired. This algorithm is selected with Use=L903.

In addition, when the direct, semi-direct, and in-core MP2 algorithms are used, the SCF phase can be either conventional, direct, or in-core. The default is direct or in-core SCF.

MP2 Gradients

The choices for MP2 gradients are much the same as for MP2 energies, except:

  • The conventional algorithm requires the storage of the two-particle density matrix and therefore uses considerably more disk than if only energies are needed. The new methods require no more disk space for gradients than for the corresponding energies.

  • The modern methods compute the integral derivatives at least twice, once in the E2 phase and once after the CPHF step. As a result, for small systems (50 basis functions and below) on scalar machines, the conventional algorithm is somewhat faster.

  • The integral derivative evaluation during E2 in the new algorithms requires extra main memory if higher than f functions are used.

As for the MP2 energy, the default is to do direct or in-core SCF and then dynamically choose between semi-direct, direct, or in-core E2.

MP2 Frequencies

Only semi-direct methods are available for analytic MP2 second derivatives. These reduce the disk storage required below what a conventional algorithm requires.

MP2 frequency jobs also require significant amounts of memory. The default of six million words should be increased for larger jobs. If f functions are used, eight million words should be provided for computer systems using 64-bit integers.

Higher Correlated Methods

The correlation methods beyond MP2 (MP3, MP4, CCSD, CISD, QCISD, etc.) all require that some transformed (MO) integrals be stored on disk and thus (unlike MP2 energies and gradients) have disk space requirements that rise quartically with the size of the molecule. There are, however, several alternatives as to how the transformed integrals are generated, how many are stored, and how the remaining terms are computed:

  • z$$f$$>The default in Gaussian is a semi-direct algorithm. The AO integrals may be written out for use in the SCF phase of the calculation or the SCF may be done directly or in-core. The transformation recomputes the AO integrals as needed and leaves only the minimum number of MO integrals on disk (see below). The remaining terms are computed by recomputing AO integrals.

  • A full transformation is performed if MaxDisk supplies sufficient disk for doing so. This will be faster than other approaches unless the computer system's I/O is very slow.

  • The conventional algorithm, which was the default in Gaussian 90, involves storing the AO integrals on disk, reading them back during the transformation, and forming all of the MO two-electron integrals except those involving four virtual orbitals. The four virtual terms were computed by reading the AO integrals. This procedure can be requested in Gaussian by specifying Tran=Conven in the route section. However, it is appropriate only on very slow machines like legacy PCs.

If a post-SCF calculation can be done using a full integral transformation while keeping disk usage under MaxDisk, this is done; if not, a partial transformation is done and some terms are computed in the AO basis. Thus, it is crucial for a value for MaxDisk to be specified explicitly for these types of jobs, either within the route section or via a system wide setting in the Default.Route file. If MaxDisk is left unset, the program assumes that disk is abundant and performs a full transformation by default. If MaxDisk is not set and sufficient disk space is not available for a full transformation, the job will fail.

The following points summarize the effect of MaxDisk for post-SCF methods:

  • CID, CISD, CCD, BD, and QCISD energies also have a fixed storage requirement proportional      to O2N2, with a large factor, but obey MaxDisk in avoiding larger storage requirements.

  • CCSD, CCSD(T), QCISD(T), and BD(T) energies have fixed disk requirements proportional to ON3 which cannot be limited by MaxDisk.

  • CID, CISD, CCD, QCISD densities and CCSD gradients have fixed disk requirements of about N4/2 for closed-shell and 3N4/4 for open-shell.

Excited State Energies and Gradients

In addition to integral storage selection, the judicious use of the restart facilities can improve the economy of CIS and TD calculations.

Integral Storage

Excited states using CI with single excitations can be done using five methods (labeled by their corresponding option to the CIS keyword). Note that only the first two options are available for the TD method:

Direct
Solve for the specified number of states using iterative diagonalization, forming the product vectors from two-electron integrals computed as needed. This algorithm reduces memory and disk requirements to O(N2).

InCore
Requests that the AO Raffenetti combinations be held in memory. In-core is quite efficient, but is only practical for small molecular systems or large memory computers as N4/4 words of memory are required. This approach is used automatically if there is sufficient memory available.

MO
Solve for the specified number of states using iterative (Davidson) diagonalization, forming the product vectors using MO integrals. This is the fastest method and is the default. This algorithm is an efficient choice up to about 150 basis functions, depending on the number of occupied orbitals. The more occupied orbitals, the sooner the direct algorithm should be used. Since only integrals involving two virtuals are needed (even for gradients) an attempt is made to obey MaxDisk. The minimum disk required is about 4O2N2 (6O2N2 for open-shell).

AO
Solve for the specified number of states using iterative diagonalization, forming the product vectors from written-out AO integrals. This is a slow method and is never the best choice.

ICDiag
The entire CIS Hamiltonian matrix is loaded into core and diagonalized. This produces all possible states, but requires O2V2 memory and O3V3 CPU time. Accordingly, it is practical only for very small molecular systems and for debugging purposes.

Restarting Jobs and Reuse of Wavefunctions

CIS and TD jobs can be restarted from a Gaussian checkpoint file. This is of limited use for smaller calculations, which may be performed in the MO basis, as new integrals and transformation must be done, but is invaluable for direct CIS. If a direct CIS job is aborted during the CIS phase, then SCF=Restart should be specified in addition to CIS=Restart or TD=Restart, as the final SCF wavefunction is not moved to its permanent location (suitable for Guess=Read) until the entire job step (or optimization step) completes.

CIS Excited State Densities

If only density analysis is desired, and the excited states have already been found, the CIS density can be recovered from the checkpoint file, using Density=(Check,Current) Guess=Only, which recovers whatever generalized density was stored for the current method (presumably CIS) and repeats the population analysis. Note that the one-particle (unrelaxed) density as well as the generalized (relaxed) density can be examined, but that dipole moments and other properties at the CIS level are known to be much less accurate if the one-particle density is used (i.e., if the orbital relaxation terms are neglected) [108,447]. Consequently, the use of the CIS one-particle density is strongly discouraged, except for comparison with the correct density and with other programs that cannot compute the generalized density.

Separate calculations are required to produce the generalized density for several states, since a CPHF calculation must be performed for each state. To do this, first solve for all the states and the density for the first excited state:

# CIS=(Root=1,NStates=N) Density=Current

if N states are of interest. Then do N-1 additional runs, using a route section of the form:

CIS=(Read,Root=M,NStates=N) Density=Current

for states M=2 through N.

Pitfalls for Open-Shell Excited States

Since the UHF reference state is not an eigenfunction of S2, neither are the excited states produced by CIS or TD [573].

Stability Calculations

Tests of Triplet and Singlet instabilities of RHF and UHF and restricted and unrestricted DFT wavefunctions can be requested using the Stable keyword. The MO, AO, Direct, and InCore options are available, which request the corresponding algorithm. The default is Direct. Direct stability calculations can be restarted as described above for CIS.

CASSCF Efficiency

The primary challenge in using the CASSCF method is selecting appropriate active space orbitals. There are several possible tactics:

  • Use the standard delocalized initial guess orbitals. This is sometimes sufficient, e.g. if the active space consists of all p electrons. Use Guess=Only to inspect the orbitals and determine whether any alterations are required before running the actual calculation.

  • Use localized initial guess orbitals. This is useful if specific bond pairs are to be included, since localization separates electron pairs.

  • Use the natural orbitals from the total density from a UHF calculation (CAS-UNO) [415,416]. For singlets, this requires that one has coaxed the UHF run into converging to a broken symmetry wavefunction (normally with Guess=Mix). It is most useful for complex systems in which it is not clear which electrons are most poorly described by doubly-occupied orbitals.

In all cases, a single-point calculation should be performed before any optimization, so that the converged active space can be checked to ensure that the desired electrons have been correlated before proceeding. There are additional considerations in solving for CASSCF wavefunctions for excited states (see the discussion of the CASSCF keyword for details).

CASSCF Frequencies

CASSCF frequencies require large amounts of memory. Increasing the amount of available memory will always improve performance for CASSCF frequency jobs (the same is not true of frequency calculations performed with other methods). These calculations also require O2N2 disk space.