This submodule controls the performance of the parallel version of DALTON . The implementation has been described elsewhere [53]. When PVM is used as message passing interface , the keyword .NODES is required, otherwise all keywords are optional.
Transfers the print level from the master
to the slaves, otherwise the print level on the slaves will always be
zero. Only for debugging purposes.
READ (LUCMD,*) NDEGDI
Determines the percent of available tasks that is to be distributed in a given distribution of tasks, where a distribution of tasks is defined as the process of giving batches to all slaves. The default is 5% , which ensures that each slave will receive 20 tasks during one integral evaluation, which will give a reasonable control with the idle time of each slave.
READ (LUCMD, '(A7)') WORD
This keyword only applies if PVM has been chosen as message passing interface. Three different options exists:
PVMDEFA
PVMRAW
PVMINPL
The default is PVMDEFA
. We refer to PVM manuals for a
closer description of the different ways of encoding the transfer of
data between nodes.
READ (LUCMD,*) NODES
When MPI is used as message passing interface, the
default value is the number of nodes that has been assigned to the
job, and these nodes will be partitioned into one master and
NODES-1
slaves . When PVM is used for message
passing , the default is 0, and it is
therefore required to specify the number of nodes to be used in the
calculation when PVM is used. NODES
will in such calculations
correspond to the number of slaves the user would like to use in the
calculation. Note that this number may often have to be adjusted in
accordance with limits in the queuing system of various computers.
READ (LUCMD,*) NTASK
The number of tasks to send to each node when distributing the calculation of two-electron integrals . The default is 1. A task is defined as a shell of atomic integrals , a shell being an input block. One may therefore increase the number of shells given to each node in order to reduce the amount of communication. However, the program uses dynamical allocation of work to each node, and thus this option should be used with some care, as too large tasks may cause the dynamical load balancing to fail , giving an overall decrease in efficiency . The parallelization is also very coarse grained, so that the amount of communication seldom represents any significant problem. Will be come obsolete, and replaced by .DEGREE .
READ (LUCMD, '(A7)') KPRINT
Read in the print level for the parallel calculation. A print level of at least 2 is needed in order to be able to evaluate the parallelization efficiency . A complete timing for all nodes will be given if the print level is 4 or higher.