next up previous contents index
Next: Geometry optimization: *WALK Up: General input to DALTON Previous: General: *MINIMIZE

 

Parallel calculations : *PARALL

This submodule controls the performance of the parallel version  of DALTON . The implementation has been described elsewhere [53]. When PVM  is used as message passing interface , the keyword .NODES   is required, otherwise all keywords are optional.

.DEBUG  
Transfers the print level from the master   to the slaves, otherwise the print level on the slaves will always be zero. Only for debugging purposes.

.DEGREE  

READ (LUCMD,*) NDEGDI

Determines the percent of available tasks  that is to be distributed in a given distribution of tasks, where a distribution of tasks is defined as the process of giving batches to all slaves. The default is 5% , which ensures that each slave will receive 20 tasks during one integral evaluation, which will give a reasonable control with the idle time of each slave.

.ENCODE  

READ (LUCMD, '(A7)') WORD

This keyword only applies if PVM  has been chosen as message passing interface. Three different options exists:

The default is PVMDEFA. We refer to PVM  manuals for a closer description of the different ways of encoding the transfer of data between nodes.

.NODES  

READ (LUCMD,*) NODES

When MPI  is used as message passing interface, the default value is the number of nodes that has been assigned to the job, and these nodes will be partitioned into one master  and NODES-1 slaves . When PVM is used for message passing , the default is 0, and it is therefore required to specify the number of nodes to be used in the calculation when PVM is used. NODES will in such calculations correspond to the number of slaves the user would like to use in the calculation. Note that this number may often have to be adjusted in accordance with limits in the queuing system of various computers.

.NTASK  

READ (LUCMD,*) NTASK

The number of tasks to send to each node when distributing the calculation of two-electron integrals . The default is 1. A task is defined as a shell of atomic integrals , a shell being an input block. One may therefore increase the number of shells given to each node in order to reduce the amount of communication. However, the program uses dynamical allocation of work to each node, and thus this option should be used with some care, as too large tasks may cause the dynamical load balancing to fail , giving an overall decrease in efficiency . The parallelization is also very coarse grained, so that the amount of communication  seldom represents any significant problem. Will be come obsolete, and replaced by .DEGREE  .

.PRINT  

READ (LUCMD, '(A7)') KPRINT

Read in the print level for the parallel calculation. A print level of at least 2 is needed in order to be able to evaluate the parallelization efficiency . A complete timing for all nodes will be given if the print level is 4 or higher.


next up previous contents index
Next: Geometry optimization: *WALK Up: General input to DALTON Previous: General: *MINIMIZE

Kenneth Ruud
Sat Apr 5 10:26:29 MET DST 1997