Parallel calculations : *PARALL

Next: Geometry optimization: *WALK Up: General input to DALTON Previous: General: *MINIMIZE

Parallel calculations : `*PARALL`

This submodule controls the performance of the parallel version of DALTON . The implementation has been described elsewhere [53]. When PVM is used as message passing interface , the keyword .NODES is required, otherwise all keywords are optional.

.DEBUG

Transfers the print level from the master to the slaves, otherwise the print level on the slaves will always be zero. Only for debugging purposes.

.DEGREE

READ (LUCMD,*) NDEGDI

Determines the percent of available tasks that is to be distributed in a given distribution of tasks, where a distribution of tasks is defined as the process of giving batches to all slaves. The default is 5% , which ensures that each slave will receive 20 tasks during one integral evaluation, which will give a reasonable control with the idle time of each slave.

.ENCODE

READ (LUCMD, '(A7)') WORD

This keyword only applies if PVM has been chosen as message passing interface. Three different options exists:

PVMDEFA
PVMRAW
PVMINPL

The default is PVMDEFA. We refer to PVM manuals for a closer description of the different ways of encoding the transfer of data between nodes.

.NODES

READ (LUCMD,*) NODES

When MPI is used as message passing interface, the default value is the number of nodes that has been assigned to the job, and these nodes will be partitioned into one master and NODES-1 slaves . When PVM is used for message passing , the default is 0, and it is therefore required to specify the number of nodes to be used in the calculation when PVM is used. NODES will in such calculations correspond to the number of slaves the user would like to use in the calculation. Note that this number may often have to be adjusted in accordance with limits in the queuing system of various computers.

.NTASK

READ (LUCMD,*) NTASK

The number of tasks to send to each node when distributing the calculation of two-electron integrals . The default is 1. A task is defined as a shell of atomic integrals , a shell being an input block. One may therefore increase the number of shells given to each node in order to reduce the amount of communication. However, the program uses dynamical allocation of work to each node, and thus this option should be used with some care, as too large tasks may cause the dynamical load balancing to fail , giving an overall decrease in efficiency . The parallelization is also very coarse grained, so that the amount of communication seldom represents any significant problem. Will be come obsolete, and replaced by .DEGREE .

.PRINT

READ (LUCMD, '(A7)') KPRINT

Read in the print level for the parallel calculation. A print level of at least 2 is needed in order to be able to evaluate the parallelization efficiency . A complete timing for all nodes will be given if the print level is 4 or higher.

Next: Geometry optimization: *WALK Up: General input to DALTON Previous: General: *MINIMIZE

Kenneth Ruud
Sat Apr 5 10:26:29 MET DST 1997