Friday, April 18, 2014

Sumatra 0.6 released

We would like to announce the release of version 0.6.0 of Sumatra, a tool for automated tracking of simulations and computational analyses so as to be able to easily replicate them at a later date.

This version of Sumatra sees many new features, improvements to existing ones, and bug fixes:

New export formats: LaTeX and bash script

You can now export your project history as a LaTeX document, allowing you to easily generate a PDF listing details of all the simulations or data analyses you've run.

    $ smt list -l -f latex > myproject.tex

Using tags, you can also restrict the output to only a subset of the history.

You can also generate a shell script, which can be executed to repeat the sequence of computations captured by Sumatra, or saved as an executable record of your work. The shell script includes all version control commands needed to ensure the correct version of the code is used at each step.

    $ smt list -l -f shell > myproject.sh

Working with multiple VCS branches

Previous versions of Sumatra would always update to the latest version of the code in your version control repository before running the computation.

Now, "smt run" will never change the working copy, which makes it much easier to work with multiple version control branches and to go back to running earlier versions of your code.

Programs that read from stdin or write to stdout

Programs that read from standard input or write to standard output can now be run with "smt run". For example, if the program is normally run using:

    $ myprog < input > output

You can run it with Sumatra using:

    $ smt run -e myprog -i input -o output

Improvements to the Web browser interface

There have been a number of small improvements to the browser interface:

  • a slightly more compact and easier-to-read (table-like) layout for parameters;
  • the working directory is now displayed in the record detail view;
  • the interface now works if record label contain spaces or forward slashes.

The minimal Django version needed is now 1.4.

Support for PostgreSQL

The default record store for Sumatra is based on the Django ORM, using SQLite as the backend. It is now possible to use PostgreSQL instead of SQLite, which gives better performance and allows the Sumatra record store to be placed on a separate server.

To set up a new Sumatra project using PostgreSQL, you will first have to create a database using the PostgreSQL tools ("psql", etc.). You then configure Sumatra as follows:

    $ smt init --store=postgres://username:password@hostname/databasename MyProject

Note that the database tables will not be created until after the first "smt run".

Integration with the SLURM resource manager

Preliminary support for launching MPI computations via SLURM has been added.

  $ smt configure --launch_mode=slurm-mpi
  $ smt run -n 256 input_file1 input_file2

This will launch 256 tasks using "salloc" and "mpiexec".

Command-line options for SLURM can also be set using "smt configure"

  $ smt configure --launch_mode_options=" --tasks-per-node=1"

If you are using "mpiexec" on its own, without a resource manager, you can set MPI command-line options in the same way.

Improvements to the @capture decorator

The "@capture" decorator, which makes it easy to add Sumatra support to Python scripts (as an alternative to using the "smt" command), now captures stdout and stderr.

Improvements to parameter file handling

Where a parameter file has a standard mime type (like json, yaml), Sumatra uses the appropriate extension if rewriting the file (e.g. to add parameters specified on the command line), rather than the generic ".param".

Migrating data files

If you move your input and/or output data files, either within the filesystem on your current computer or to a new computer, you need to tell Sumatra about it so that it can still find your files. For this there is a new command, "smt migrate". This command also handles changes to the location of the data archive, if you are using one, and changes to the base URL of any mirrored data stores. For usage information, run:

    $ smt help migrate

Miscellaneous improvements

  • "smt run" now passes unknown keyword args on to the user program. There is also a new option '--plain' which prevents arguments of the form "x=y" being interpreted by Sumatra; instead they are passed straight through to the program command line.
  • When repeating a computation, the label of the original is now stored in the 'repeats' attribute of 'Record', rather than appending "_repeat" to the original label. The new record will get a new unique label, or a label specified by the user. This means a record can be repeated more than once, and is a more reliable method of indicating a repeat.
  • A Sumatra project now knows the version of Sumatra with which it was created.
  • "smt list" now has an option '-r'/'--reverse' which lists records oldest to newest.

Bug fixes

A fair number of bugs have been fixed.

Download, support and documentation

The easiest way to get the latest version of Sumatra is

  $ pip install sumatra

Alternatively, Sumatra 0.6.0 may be downloaded from PyPI or from the INCF Software Center. Support is available from the sumatra-users Google Group. Full documentation is available on pythonhosted.org.

No comments: