Extension

The SlyPI package is designed to be extensible and adaptable for use in different situations, which are desribed below.

Slycat Interaction

The core functionality of SlyPI allows interaction with the Slycat server through it’s REST API. This functionality is provided in the base slypi module (e.g. using import slypi).

The core SlyPI functionality is exposed via the various utilities previously describing, consisting mainly of the command line sripts available to upload models to Slycat. These scripts can be be used as is or can be called from within other python scripts.

Here is an example of uploading a Parameter Space model from within a python script.

import slypi.ps.upload_csv as ps_csv

# parameter space file
CARS_FILE = ['example-data/cars.csv']

# input/output columns for cars data file
CARS_INPUT = ['--input-columns', 'Model', 'Cylinders', 'Displacement', 'Weight', 'Year']
CARS_OUTPUT = ['--output-columns', 'MPG', 'Horsepower', 'Acceleration']

# create PS model from cars file
ps_parser = ps_csv.parser()
arguments = ps_parser.parse_args(CARS_FILE + CARS_INPUT + CARS_OUTPUT)
ps_csv.create_model(arguments, ps_csv.log)

Ensemble Utilities

A secondary SlyPI functionality serves to manipulate numerical simulation output for the purpose of creating Slycat models. This functionality is exposed in the ensemble utilities submodule (e.g. using import slypi.ensemble). As with the core SlyPI functionality, these utilities can be used from the command line, but are also available from within python.

These utilities can be used to pull data from particular simulation outputs and manipulate tables used for Slycat imports. They also supply algorithms for data analysis outside of Slycat, and simultaneously expose algorithms that Slycat uses for some of it’s model creation (for example the VideoSwarm time-aligned PCA algorithm).

Here is an example of using the SlyPI ensemble table utility to join analysis results for a later import to a Parameter Space model.

import os

# table command line ensemble table utility
import slypi.ensemble.table as table

# create Parameter Space table by joining tables createed during analysis
arg_list = ['--join',
            'example-data/spinodal-out/movies.csv',
            'example-data/spinodal-out/end-state.csv',
            'example-data/spinodal-out/auto-PCA-end-state.csv',
            'example-data/spinodal-out/auto-Isomap-end-state.csv',
            '--output-dir', 'example-data/spinodal-out',
            '--ignore-index',
            '--csv-out ps-PCA-Isomap.csv',
            '--csv-no-index',
            '--over-write']
table.main(arg_list)

Plugins

Finally, within the SlyPI ensemble submodule, plugins exist for interating with particular file format or particular simulation codes. These routines include code that reads specific file formats, file-conversion, and any particular pre-processing requirements.

The routines are provided to SlyPI by over-riding functions in the PluginTemplate class. The PluginTemplate class provides very basic functionality, such as converting images to movies. Generally speaking, however, the basic template won’t work for a particular simulation.

Plugins can be found in the slypi/ensemble/plugins source directory, but as an example, here a subroutine which over-rides the standard mesh reader in PluginTemplate:

# read npy and sim.npy (also npy) formats
def read_file(self, file_in, file_type=None):

    # check file extension, if not provided
    if file_type is None:

        # npy file type
        if file_in.endswith('.npy'):
            file_type = 'npy'

    # check if we have npy or sim.npy
    if file_type == 'npy':

        # read npy file
        try:
            data = np.load(file_in)
        except ValueError:
            self.log.error("Could not read " + file_in + " as a .npy file.")
            raise ValueError("Could not read " + file_in + " as a .npy file.")

    # otherwise default to mesh
    else:
        data = super().read_file(file_in, file_type)

    return data

Plugin API

This section provides an API for the slypi.ensemble utilities in case you want to create your own plugin.

class slypi.ensemble.ArgumentParser(*arguments, **keywords)

Return an instance of argparse.ArgumentParser, pre-configured with arguments used to run ensemble tools.

Command line flags defined by this class are:

  • --log-level

  • --log-file

  • --plugin

The --log-level flag specifies the log level to use for the screen, the --log-file flag specifies a log file to write (debug and above), and the --plugin flag specifies the plugin Python file to use.

Example:

import slypi.ensemble

# describe and initialize parser
my_description = "My extension of the parameter space parser."
my_parser = ensemble.ArgumentParser(description=my_description)

# add an argument
my_parser.add_argument('--my-flag', help="My extension command line flag")
parse_args(list_input=None)

Extends standard argparse.parse_args() call. Uses parse_known_args() to parse the base arguments and any additional arguments. Returns any unknown arguments as a list. The unknown arguments can be used to set plugin specific variables.

Example:

# parse command line
args = my_parser.parse_args()

# parse command line and start logger
args, unknown_arg_list = my_parser.parse_args()

# arguments can be accessed using, e.g.
print(args.log_level)

# unknown arguments are returned as a list
print(unknown_arg_list)
Warning:

Python uses prefix-matching, so that if a plugin uses an argument flag that matches the prefix of an already existing argument (including any argument defined in a utility), that argument will be not be passed onto the plugin.


slypi.ensemble.init_logger(log_file=None, log_level='info')

Starts slypi ensemble logger, sets logging level for console and opens log file, if desired. Console only outputs messages while log file includes time stamps and origin of message. Log file is set to debug and above.

Example:

import logging

# start logger for extension
my_log = logging.getLogger("slypi.my_extension")
my_log.info("My log message.")

# or use slypi ensemble logger
slypi.ensemble.log.info("My message.")

class slypi.ensemble.utilities.Table(log, data_frame=None, csv_file=None, ensemble_spec=None, file_spec=None, header=None, no_index=False)

Provides storage and methods for keeping track of simulation data and files in an ensemble. The central assumption is that the esnemble is stored in a directory structure of the form (names are arbitrary):

ensemble
|-- ensemble.info
|-- simulation.1
    |-- simulation.info
    |-- time.step.1
    |-- time.step.2
    |-- ...
|-- simulation.2
|-- simulation.3
...

where ensemble is the central directory, containing multiple simulations, each having it’s own directory simulation.1, simulation.2, and so on. The simulation directories then contain files containg time step data time.step.1, time.step.2, etc. The directory names can be somewhat arbitrary and can have additional subdirectories and files, but the convention for using these utitilities is that the simulation directories can be specified using a Python like %d[::] specifier. For example, for the above ensemble, we would use ensemble/simulation.%d[1:] to specify the file name format for each of the simulation folders, then we would use time.step.%d[1:] to specify the time step files within each simulation folder.

The %d[::] notation specifies the order and numbers in the directory/file names and using the Python slicing conventions. For example %d specifies all names with an integer in the given location, %d[5:10] specifies all names with an integer starting at 5 and ending at 9, and %d[100:2:-2] specifies all names with an integer starting at 100 and descending by 2 to 2.

The numbers in the directory/file names are assumed to be >= 0 but are otherwise unrestriced.

To instantiate an ensemble.Table object, use either a .csv file to create a full table, or an ensemble_spec, file_spec, and table header to create a single column table.

Parameters:
  • csv_file=None (string) – file name of .csv file

  • ensemble_spec=None (string) – string with %d[::] giving simulation directories

  • file_spec=None (string) – string with %d[::] giving time step file names

  • header=None (string) – name for column header

Note: If csv_file is provided then the other inputs are ignored. If no inputs are provided then an empty table is created.

add_col(col, header)

Adds a column to the ensemble table. This column can be a list or a dictionary. If it is a list, the column is added in the list order. If it is a dictionary, the column is added in the order of the dictionary keys by matching with an existing column. The column is added at the end of the table.

Parameters:
  • col (list or dict) – column data to add

  • header (string) – name of new column

convert_cols(cols, uri_root)

Converts the specified columns in the table to have the given URI root.

Parameters:
  • cols (list) – names of columns to convert

  • uri_root (string) – URI root to use for conversion

Note: Resulting file is output in unix format (forward slashes).

convert_specifier(file_spec, output_dir, output_type)

Returns an output file specifier matching the provided input specifier.

Parameters:
  • file_spec (string) – file path with %d[::] specifier

  • output_dir (string) – output directory for file spec files

  • output_type (string) – file extension of output files

Returns:

output file path with %d[::] specifier

Return type:

out_file_spec (string)

directories(directory_spec)

Return a list of directories matching %d[::] specifier. The specifier is expanded and existing directories are identified and returned.

Parameters:

directory_spec (string) – directory name with %d[::] specifier

Returns:

list of directories matching specifier

Return type:

directory_list (list)

ensemble_files(ensemble_dirs, parallel=False)

Returns a list of lists of files matching %d[::] specifier in the ensemble_dirs. The specifier is expanded and existing files are identified and returned. The directories are expected to exist.

Parameters:
  • ensemble_dirs (list) – list of ensemble directories to expand (with specifier)

  • parallel (boolean) – run in parallel with ipyparallel (default False)

Returns:

list of list of files matching specifier

Return type:

sim_files (list)

files(file_spec)

Return a list of files matching %d[::] specifier. The specifier is expanded and existing files are identified and returned.

Parameters:

file_spec (string) – file path with %d[::] specifier

Returns:

list of files matching specifier

Return type:

file_list (list)

get_col(col)

Returns a column from the ensemble table.

Parameters:

col (string) – name of column to return

Returns:

list of contents in column

Return type:

col_list (list)

mirror_directories(output_dir, ensemble_dirs, over_write)

Creates a set of directories in the output directory which mirror the ensemble directory structure, unless output directory already exists.

Parameters:
  • output_dir (string) – name of output directory to create

  • ensemble_dirs (list) – list of ensemble directories to mirror

  • over_write (boolean) – true to over-write existing directories

Returns:

list of mirror directories (including output_dir),

None if directories were not created

Return type:

mirror_dirs (list)

to_csv(file_out, output_dir='', cols=None, exc_cols=None, index=True, index_label=None)

Writes out the table to a .csv file.

Parameters:
  • file_out (string) – name of .csv file

  • output_dir (string) – output directory to use for .csv file

  • cols (list) – list of column headers to output

  • exc_cols (list) – list of column headers to exclude from output

  • index (boolean) – write out index column

  • index_label (string) – use as index header


class slypi.ensemble.utilities.EnsembleSpecifierError(specifier, message='invalid %d[::] format')

Exception raised for errors in %d[::] format specifier.

Parameters:
  • specifier (string) – input specifier which caused the error

  • message (string) – explanation of the error


class slypi.ensemble.PluginTemplate(description=None)

Provides an extensible architecture for accomodating different input/output formats, machine learning algorithms, and simulations.

Plugins must be defined as following in a seperate .py file:

Example:

import slypi

class Plugin(slypi.ensemble.PluginTemplate):

    ...

See memphis.py for an example.

add_args()

Note these flags should not conflict with already used flags (see parse_args class ArgumentParser).

Example:

# plugin adds command line argument
self.parser.add_argument("--my_option", help="My option for plugin.")
check_args(args)

Checks plugin arguments and raises exceptions if there are errors.

Parameters:

args (ArgumentParser object) – processed argument list

convert_file(file_in, file_out, file_in_type=None, file_out_type=None)

Converts from file_in to file_out, where file_in can be a string or a buffer. File types are inferred from extensions unless provided. Uses the meshio library.

Parameters:
  • file_in (string) – name of input file

  • file_out (string) – name of output file

  • file_in_type (string) – file input format (regardless of extension)

  • file_out_type (string) – file output format (extension)

convert_files(file_list, output_dir, output_type, input_type=None)

Converts a list of files to file of type output_type in output_dir with same root name. Input file types are inferred from extensions, unless type is provided. Output type is also inferred, unless provided.

Parameters:
  • file_list (list) – list of file names to read

  • output_dir (string) – name of output directory to write files

  • output_type (string) – extension of file format for output

  • input_type (sring) – file input type (regardless of extension)

Returns:

list of files written using full path

Return type:

files_written (list)

expand(table, header, file_list, **kwargs)

Expands a column in a ensemble table by reading the file links and creating files appropriate to the plugin.

Parameters:
  • table (ensemble Table object) – table with column containing file links

  • header (string or int) – name of column to with file links

  • file_list (list) – list of files to expand

  • **kwargs – additional arguments dependent on plugin

init(args)

Initialize any local variables from command line arguments.

Parameters:

args (ArgumentParser object) – processed argument list

Example:

self.my_var = args.my_option
parse_args(arg_list=[])

Parses arguments specific to plugin.

Parameters:

arg_list (list) – list of command line flags and arguments

Returns:

ArgumentParser processed argument list, list of un-recognized arguments

Return type:

args (object), unknown_args (list)

preprocess(data)

Performs data pre-processing specific to a simulation. This code must be provided by the plugin, otherwise the data is returned unchanged. Note that this type of pre-processing is assumed to be per file (e.g. per time step or per simulation). Pre-processing that occurs over the entire ensemble is done by the algorithm codes (e.g. dimension reduction or proxy models).

Parameters:

data (object) – data to be pre-processed

Returns:

pre-processed numpy array with simulations per row

Return type:

data_out (2d array)

read_file(file_in, file_type=None)

Reads a file associated with a single time step in an ensemble. File type is inferred from extension unless provided.

Parameters:
  • file_in (string) – name of file to read

  • file_type (string) – file input type (regardless of extension)

Returns:

file contents

Return type:

data (object)

read_file_batch(batch_files, file_type=None, parallel=False, flatten=True)

Reads a batch of files from an ensemble. File type is inferred from extension unless provided. Can be run in parallel using ipyparallel.

Parameters:
  • batch_files (list) – list of files to read

  • file_type (string) – file input type (regardless of extension)

  • parallel (boolean) – to run in parallel using ipyparallel

  • flatten (boolean) – flatten matrix files to vectors (defaults True)

Returns:

file of file contents

Return type:

data_list (list)

read_input_deck(file_list, file_type=None)

Reads a file or files which provide the input parameters for a simulation. Note that this code must be provided by the plugin.

Parameters:
  • file_list (list) – list of file names to read (can be a list of one file)

  • file_type (string) – file type (regardless of extension)

Returns:

meta data for the simulation

Return type:

file_data (object)

write_file(data, file_out, file_type=None)

Writes time step data from an ensemble to a file. File type is inferred from extension unless provided.

Parameters:
  • data (meshio mesh) – mesh data to be written

  • file_out (string) – file name for output file

  • file_type (string) – file extension


slypi.ensemble.plugin(plugin_name, arg_list=None)

Factory function to instantiate a plugin module from a file and a list of command line arguments.

Parameters:
  • plugin_name (string) – module name (no .py) or file name of module (ending in .py)

  • arg_list (list) – list containing command line flags and argumetns

Returns:

plugin as Python namespace, list of un-recognized arguments

Return type:

plugin (object), unknown_args (list)

Example:

# import and initialize plugin
plugin, unknown_args = slypi.ensemble.plugin(args.plugin, arg_list)

class slypi.ensemble.algorithms.reduction.DimensionReduction(arg_list=None, model_file=None)

This class contains wrappers for dimension reduction algorithms in sci-kit learn for use with the slypi ensemble tools. It includes it’s own parser to specify algorithms and algorithm parameters.

Parameters:
  • arg_list (list) – list of arguments to specific to reduction

  • model_file (string) – name of model file containing reduction

Example:

# get parser and reduction algorithm code
import slypi.ensemble as ensemble
import slypi.algorithms.reduction as algorithms

# parse command line
my_parser = ensemble.ArgumentParser()

# parse command line and start logger
args, arg_list = my_parser.parse_args()

# set up dimension reduction algorithm using command line arguments
algorithm = algorithms.DimensionReduction(arg_list=arg_list)

# use time_align to use a time-aligned model, where time_align
# specifies the number of dimension to use per time step
time_aligned_algorithm = algorithms.DimensionReduction(time_align=10)

# set up data in variable X, data points per row

# do dimension reduction (ala sklearn)
algorithm.fit(X)

# reduced data to lower dimension
reduced_data = algorithm.transform(X)
data_explained()

Returns the percent of information captured per dimension per model. For PCA this would be the explained variance ratio. If a model doesn’t compute this information, an empty list is returned.

Returns:

list of vectors of percent captured

Return type:

data_explained (list)

fit(data, time_step=0)

Train dimension reduction model using samples.

Parameters:
  • data (array) – data with points as rows

  • time_step (int) – train model at given time step

has_inverse()

Check if user selected algorithm has an inverse.

Returns:

True if algorithm has an inverse

Return type:

has_inverse (bool)

is_incremental()

Test if the user selected an incremental algorithm.

Returns:

True if algorithm can be used in batch mode

Return type:

is_incremental (bool)

load(file_in)

Loads a dimension reduction model from a .pkl file.

Parameters:

file_in (string) – file name with saved model

num_dim()

Get desired number of dimensions in reduction.

Returns:

number of dimensions for desired reduction

Return type:

num_dim (int)

partial_fit(data, time_step=0)

Train an incremental model using samples.

Parameters:
  • data (array) – data with points as rows

  • time_step (int) – model time step

save(file_out)

Saves a dimension reduction model to a .pkl file.

Parameters:

file_out (string) – file name to save file

time_align(data, compute_rotations=True)

Time align reduced data using the Kabsch algorithm. Expects the incoming dimension to be time_align_dim and truncates the dimension to num_dim after alignment.

Args: data (list of array): list of data matrices of shape (sim, dim)

compute_rotations (boolean): False to use existing rotation matrices

Returns:

list of data with shape (sim, reduced dim)

Return type:

aligned_data (list of array)

time_align_dim()

Get number of dimensions to use for time alignment.

Returns:

number of time alignment dimension

Return type:

time_align (int)

transform(data, time_step=0)

Transform data to lower dimensional representation.

Parameters:
  • data (numpy array) – data with points as rows

  • time_step (int) – transform data at given time step

Returns:

reduced data

Return type:

data (array)