Nodes

The building blocks provided with BioSimSpace can be used to write interoperable workflow components, or nodes. Typically, a node will perform a single, well-defined, unit of work with clear inputs and outputs. The BioSimSpace.Gateway package acts as a bridge between BioSimSpace and the outside world, allowing a user to construct a node and define the input and output requirements, along with restrictions on their types and values.

Writing nodes

As an example, the following code snippet shows how we could write a node to perform an energy minimisation on a molecular system loaded from file:

import BioSimSpace as BSS

# Initialise the Node object.
node = BSS.Gateway.Node("Minimise a molecular system and save to file.")

# Set the node author and license.
node.addAuthor(name="Lester Hedges",
               email="lester.hedges@bristol.ac.uk",
               affiliation="University of Bristol")
node.setLicense("GPLv3")

# Set the node inputs.
node.addInput("files", BSS.Gateway.FileSet(help="A set of molecular input files."))
node.addInput("steps", BSS.Gateway.Integer(help="The number of minimisation steps.",
                                           minimum=0, maximum=1000000, default=10000))

# Set the node outputs.
node.addOutput("minimised", BSS.Gateway.FileSet(help="The minimised molecular system."))

# Show the graphical user interface (GUI) to allow the user to set the inputs.
# This will only happen if running interactively, i.e. in a Jupyter notebook.
node.showControls()

# Load the molecular system using the user defined input "files".
system = BSS.IO.readMolecules(node.getInput("files"))

# Define the minimisation protocol using the user defined number of "steps".
protocol = BSS.Protocol.Minimisation(steps=node.getInput("steps"))

# Execute the process using any available molecular dynamics engine.
process = BSS.MD.run(system, protocol)

# Set the node output to the final configuration of the minimisation process.
# Note that the pass block=True to the getSystem call to ensure that the
# process finished before getting the final configuration. (It is possible
# to query the running process in real time when running interactively.)
# Note that the original file format of the system is preserved on write.
node.setOutput("minimised", BSS.IO.saveMolecules("minimised",
    process.getSystem(block=True), system.fileFormat()))

# Finally, validate the node to make sure that outputs are set correctly
# and no errors have been raised. If running interactively, this will
# generate a download link to a zip file containing the node outputs.
node.validate()

Running nodes

BioSimSpace nodes are flexible in the way in which they can be used, with the same script working seamlessly from within a a Jupyter notebook or on the command-line. Typically, a user would a write a node as a fully documented, interactive Jupyter notebook, then save it as a regular Python script to run from the command-line. (In the section above we included a Python script representation of the node, which could be re-converted to a notebook using, e.g., p2j.) Any purely interactive elements included in the node, e.g. visualisations and plots, are simply ignored when the script is run in a non-interactive mode. To facilitate this dual-use the node.addInput method generates a custom ipywidgets based graphical user interface for interactive use in Jupyter, or a custom argparse parser for handling command-line arguments. The following figure shows the example node above running within a Jupyter notebook (top) and from the command-line (bottom).

_images/node.png

When working interactively, BioSimSpace also provides functionality for interacting with processes while they are running. This allows the user to monitor the progress of a simulation and generate near real-time plots and visualisations.

While BioSimSpace isn’t intended to be a workflow manager it does provide a means of chaining together nodes by passing the output of one node as the input to another. For example, given the following YAML configuration file, config.yaml:

files:
- amber/ala.crd
- amber/ala.top

it would be possible to run a minimisation followed by an equilibration as follows:

python minimisation.py --config config.yaml && python equilibration.py --config output.yaml

Nodes can also be accessed from within BioSimSpace itself, allowing the user access to existing functionality as building blocks for more complex scripts. For example, the minimisation node can be run from within BioSimSpace as follows:

# Create a dictionary of inputs to the node.
input = {"files" : ["amber/ala.crd", "amber/ala.top"], "steps" : 1000}

# Run the node and capture the output as a dictionary.
output = BSS.Node.run("minimisation", input)

(Note that, by default, a node library is created in a hidden _nodes directory wherever BioSimSpace is installed, e.g. /usr/local/lib/python3.7/site-pacakges/BioSimSpace/Nodes/_nodes. To set a custom directory, use the BioSimSpace.Node.setNodeDirectory function.)

Common Workflow Language

It is also possible to export a node as a Common Workflow Language (CWL) wrapper. For example, using the minimisation.py example from the previous section:

python minimisation.py --export-cwl

This will write a wrapper called minimisation.cwl to the current directory.

The node could then be run as part of a CWL workflow using something like:

cwltool minimisation.cwl config.yaml

Here config.yaml is a YAML configuration file, e.g.:

files:
  - {class: File, path: /home/lester/BioSimSpace/demo/amber/ala/ala.top}
  - {class: File, path: /home/lester/BioSimSpace/demo/amber/ala/ala.crd}

steps:
  1000

At present, using BioSimSpace within CWL is limited to the use of BioSimSpace.Gateway.File and BioSimSpace.Gateway.FileSet requirements, which cover the majority of use cases. Due to the way in which CWL works, the prefix used for output files must match the name used for the requirement, e.g. if a requirement was called output, then a file might be named output.txt. This allows the use of glob in the CWL outputBinding functionality. This requirement is automatically enforced so that files will be renamed when a mismatch is found.

Any unit based input requirement, e.g. BioSimSpace.Gateway.Length, should be specifying as a CWL string type to allow for greatest flexibility, e.g.:

length:
  25 Angstroms

Forwards compatibility

To ensure that BioSimSpace nodes are forwards compatible as new features are added all sub packages can query their own functionality and present this to the user. For example, calling BioSimSpace.IO.fileFormats returns a list of the currently supported molecular file formats, BioSimSpace.Solvent.waterModels returns a list of the supported water models, etc. These values can be passed as the allowed keyword argument when setting an input requirement of a node, ensuring that the node supports the latest functionality of the package version that is installed. The following code snippet shows a node that can be used to convert to any supported molecular file format, which will continue to work as additional formats are added.

import BioSimSpace as BSS

# Initialise the Node object.
node = BSS.Gateway.Node("Convert between molecular file formats.")

# Set the node author and license.
node.addAuthor(name="Lester Hedges",
               email="lester.hedges@bristol.ac.uk",
               affiliation="University of Bristol")
node.setLicense("GPLv3")

# Set the node inputs.
node.addInput("files", BSS.Gateway.FileSet(help="A set of molecular input files."))
node.addInput("file_format", BSS.Gateway.String(help="The format to convert to.",
                                                allowed=BSS.IO.fileFormats()))

# Set the node outputs.
node.addOutput("converted", BSS.Gateway.File(help="The converted file."))

# Show the graphical user interface to allow the user to set the inputs.
# This will only happen if running interactively, i.e. in a Jupyter notebook.
node.showControls()

# Load the molecular system using the user defined input "files".
system = BSS.IO.readMolecules(node.getInput("files"))

# Convert the system to the chosen format and set the output.
node.setOutput("converted",
    BSS.IO.saveMolecules("converted", system, node.getInput("file_format")))

# Validate the node.
node.validate()

The following figure shows how the allowed=BSS.IO.fileFormats() argument is translated into a dropdown menu for the Jupyter GUI (top), or using the choices option of argparse on the command-line (bottom). This means that the script is adaptive to the support of additional file parsers in future without need for modification.

_images/forwards_compatibility.png

Examples

A set of example nodes are available on our notebook server.