Home Publications Tutorials Mini-tutorials Restarting stopped calculations
Restarting stopped calculations PDF Print E-mail

There are some situations in which you may want to restart a calculation, usually one that didn't converge. This document explains how to do it.

 

Restarting a stopped calculation

It may happen that a calculation terminates before converging, for instance due to a power outage, or if a job runs over its allocated walltime in a queue. It may also be that convergence is not reached within the set number of maximum iterations. In these cases you will want to restart the job from the point it stopped.

For this purpose, ATK saves the current state of the calculation to a checkpoint file at regular intervals. The frequency of saving, and the name of the checkpoint file, can be controlled (see the manual for details on this); the default is to save the checkpoint file every 30 minutes. The name of the checkpoint file is always written to the log file.

Simplistic approach

The quickest way to restart a calculation from a checkpoint file is to create and run a small Python script:

configuration = nlread("checkpointfile.nc")[0]
configuration.update(force_restart=True)
nlsave("file.nc",configuration)

The argument to nlread() should of course be set to the actual checkpoint file name.

The disadvantage of this is that if the original script contained some analysis (compute the band structure, for instance), this has to be separated out as a new script.

Restarting the original script

Conceptually, a better approach would be to rerun the script you have, but tell it to start not from scratch, but from the checkpoint file. This would also retain all analysis, as defined in the original script. This is possible; you just neeed to insert the above lines of code in the appropriate way.

Let us assume that you have a "standard" script, produced by the Script Generator in VNL, without too many elaborate steps. That is, a straightforward sequence of "Configuration", "New Calculator" followed by analysis quantities. In other cases, you can always modify the script in the same way as described here, but you have to take more care to preserve the logic. Special care needs to be take if the script contains an Initial State block.

Open your original script in the Editor and locate the line

device_configuration.update()

Change this line to

device_configuration.update(force_restart=True)

(For bulk or molecule calculations, the variable will be called molecule_configuration or bulk_configuration instead.)

Then, add the following line before that line:

device_configuration = nlread("checkpointfile.nc")[0]

The argument to nlread() should of course be set to the actual checkpoint file name.

Now you can rerun the script.

Notes

  • The default location of the checkpoint file is in the directory specified by the environment variable TEMP.
  • If you are running on a large cluster, you may not have permission to write to the TEMP directory, and even if you do, any files you create in this directory may be deleted automatically when your job finishes - even if the ATK calculation didn't converge. In this case it is important to specify the location of the checkpoint file manually, to make it go into your own directory.
  • The name of the checkpoint file is always written to the log file.
  • The checkpoint file is not written exactly at the specified interval, but only when a step in the self-consistent loop has been completed and the requested time interval has passed.
  • The history of the self-consistent loop is not written to the checkpoint file. Therefore, convergence might become more difficult when restarting, since the mixing algorithm has less information to work with than normally.
  • See the manual for details on how to specify the checkpoint file name and time interval.

Restarting geometry optimizations

Restarting an optimization is much more complicated. For a lengthy relaxation it is always a good idea to use a trajectory file, and if the calculation is interrupted one can take out some of the later images and set up a new optimization using this geometry as starting point. Note, however, that some images in a QuasiNewton geometry optimization are "test balloons", which may correspond to very large forces (i.e. a very bad guess), especially during the first 5-10 steps. So, it can be important to choose an image which doesn't have too large forces.

 
 
Free Trial

Latest Forum posts

sheldrakes-bath
sheldrakes-bath
sheldrakes-bath
sheldrakes-bath