Extension¶
The SlyPI package is designed to be extensible and adaptable for use in different situations, which are desribed below.
Slycat Interaction¶
The core functionality of SlyPI allows interaction with the Slycat server
through it’s REST API. This functionality is provided in the base slypi module
(e.g. using import slypi).
The core SlyPI functionality is exposed via the various utilities previously describing, consisting mainly of the command line sripts available to upload models to Slycat. These scripts can be be used as is or can be called from within other python scripts.
Here is an example of uploading a Parameter Space model from within a python script.
import slypi.ps.upload_csv as ps_csv
# parameter space file
CARS_FILE = ['example-data/cars.csv']
# input/output columns for cars data file
CARS_INPUT = ['--input-columns', 'Model', 'Cylinders', 'Displacement', 'Weight', 'Year']
CARS_OUTPUT = ['--output-columns', 'MPG', 'Horsepower', 'Acceleration']
# create PS model from cars file
ps_parser = ps_csv.parser()
arguments = ps_parser.parse_args(CARS_FILE + CARS_INPUT + CARS_OUTPUT)
ps_csv.create_model(arguments, ps_csv.log)
Ensemble Utilities¶
A secondary SlyPI functionality serves to manipulate numerical simulation output for the
purpose of creating Slycat models. This functionality is exposed in the ensemble utilities
submodule (e.g. using import slypi.ensemble). As with the core SlyPI functionality, these
utilities can be used from the command line, but are also available from within python.
These utilities can be used to pull data from particular simulation outputs and manipulate tables used for Slycat imports. They also supply algorithms for data analysis outside of Slycat, and simultaneously expose algorithms that Slycat uses for some of it’s model creation (for example the VideoSwarm time-aligned PCA algorithm).
Here is an example of using the SlyPI ensemble table utility to join analysis results for a later import to a Parameter Space model.
import os
# table command line ensemble table utility
import slypi.ensemble.table as table
# create Parameter Space table by joining tables createed during analysis
arg_list = ['--join',
'example-data/spinodal-out/movies.csv',
'example-data/spinodal-out/end-state.csv',
'example-data/spinodal-out/auto-PCA-end-state.csv',
'example-data/spinodal-out/auto-Isomap-end-state.csv',
'--output-dir', 'example-data/spinodal-out',
'--ignore-index',
'--csv-out ps-PCA-Isomap.csv',
'--csv-no-index',
'--over-write']
table.main(arg_list)
Plugins¶
Finally, within the SlyPI ensemble submodule, plugins exist for interating with particular file format or particular simulation codes. These routines include code that reads specific file formats, file-conversion, and any particular pre-processing requirements.
The routines are provided to SlyPI by over-riding functions in the PluginTemplate class.
The PluginTemplate class provides very basic functionality, such as converting images to movies.
Generally speaking, however, the basic template won’t work for a particular simulation.
Plugins can be found in the slypi/ensemble/plugins source directory, but as an example,
here a subroutine which over-rides the standard mesh reader in PluginTemplate:
# read npy and sim.npy (also npy) formats
def read_file(self, file_in, file_type=None):
# check file extension, if not provided
if file_type is None:
# npy file type
if file_in.endswith('.npy'):
file_type = 'npy'
# check if we have npy or sim.npy
if file_type == 'npy':
# read npy file
try:
data = np.load(file_in)
except ValueError:
self.log.error("Could not read " + file_in + " as a .npy file.")
raise ValueError("Could not read " + file_in + " as a .npy file.")
# otherwise default to mesh
else:
data = super().read_file(file_in, file_type)
return data
Plugin API¶
This section provides an API for the slypi.ensemble utilities in case you want to create your own plugin.
- class slypi.ensemble.ArgumentParser(*arguments, **keywords)¶
Return an instance of argparse.ArgumentParser, pre-configured with arguments used to run ensemble tools.
Command line flags defined by this class are:
--log-level
--log-file
--plugin
The
--log-levelflag specifies the log level to use for the screen, the--log-fileflag specifies a log file to write (debug and above), and the--pluginflag specifies the plugin Python file to use.- Example:
import slypi.ensemble # describe and initialize parser my_description = "My extension of the parameter space parser." my_parser = ensemble.ArgumentParser(description=my_description) # add an argument my_parser.add_argument('--my-flag', help="My extension command line flag")
- parse_args(list_input=None)¶
Extends standard argparse.parse_args() call. Uses parse_known_args() to parse the base arguments and any additional arguments. Returns any unknown arguments as a list. The unknown arguments can be used to set plugin specific variables.
- Example:
# parse command line args = my_parser.parse_args() # parse command line and start logger args, unknown_arg_list = my_parser.parse_args() # arguments can be accessed using, e.g. print(args.log_level) # unknown arguments are returned as a list print(unknown_arg_list)
- Warning:
Python uses prefix-matching, so that if a plugin uses an argument flag that matches the prefix of an already existing argument (including any argument defined in a utility), that argument will be not be passed onto the plugin.
- slypi.ensemble.init_logger(log_file=None, log_level='info')¶
Starts slypi ensemble logger, sets logging level for console and opens log file, if desired. Console only outputs messages while log file includes time stamps and origin of message. Log file is set to debug and above.
- Example:
import logging # start logger for extension my_log = logging.getLogger("slypi.my_extension") my_log.info("My log message.") # or use slypi ensemble logger slypi.ensemble.log.info("My message.")
- class slypi.ensemble.utilities.Table(log, data_frame=None, csv_file=None, ensemble_spec=None, file_spec=None, header=None, no_index=False)¶
Provides storage and methods for keeping track of simulation data and files in an ensemble. The central assumption is that the esnemble is stored in a directory structure of the form (names are arbitrary):
ensemble |-- ensemble.info |-- simulation.1 |-- simulation.info |-- time.step.1 |-- time.step.2 |-- ... |-- simulation.2 |-- simulation.3 ...
where ensemble is the central directory, containing multiple simulations, each having it’s own directory simulation.1, simulation.2, and so on. The simulation directories then contain files containg time step data time.step.1, time.step.2, etc. The directory names can be somewhat arbitrary and can have additional subdirectories and files, but the convention for using these utitilities is that the simulation directories can be specified using a Python like
%d[::]specifier. For example, for the above ensemble, we would useensemble/simulation.%d[1:]to specify the file name format for each of the simulation folders, then we would usetime.step.%d[1:]to specify the time step files within each simulation folder.The
%d[::]notation specifies the order and numbers in the directory/file names and using the Python slicing conventions. For example%dspecifies all names with an integer in the given location,%d[5:10]specifies all names with an integer starting at 5 and ending at 9, and%d[100:2:-2]specifies all names with an integer starting at 100 and descending by 2 to 2.The numbers in the directory/file names are assumed to be >= 0 but are otherwise unrestriced.
To instantiate an ensemble.Table object, use either a .csv file to create a full table, or an ensemble_spec, file_spec, and table header to create a single column table.
- Parameters:
csv_file=None (string) – file name of .csv file
ensemble_spec=None (string) – string with %d[::] giving simulation directories
file_spec=None (string) – string with %d[::] giving time step file names
header=None (string) – name for column header
Note: If csv_file is provided then the other inputs are ignored. If no inputs are provided then an empty table is created.
- add_col(col, header)¶
Adds a column to the ensemble table. This column can be a list or a dictionary. If it is a list, the column is added in the list order. If it is a dictionary, the column is added in the order of the dictionary keys by matching with an existing column. The column is added at the end of the table.
- Parameters:
col (list or dict) – column data to add
header (string) – name of new column
- convert_cols(cols, uri_root)¶
Converts the specified columns in the table to have the given URI root.
- Parameters:
cols (list) – names of columns to convert
uri_root (string) – URI root to use for conversion
Note: Resulting file is output in unix format (forward slashes).
- convert_specifier(file_spec, output_dir, output_type)¶
Returns an output file specifier matching the provided input specifier.
- Parameters:
file_spec (string) – file path with
%d[::]specifieroutput_dir (string) – output directory for file spec files
output_type (string) – file extension of output files
- Returns:
output file path with
%d[::]specifier- Return type:
out_file_spec (string)
- directories(directory_spec)¶
Return a list of directories matching
%d[::]specifier. The specifier is expanded and existing directories are identified and returned.- Parameters:
directory_spec (string) – directory name with
%d[::]specifier- Returns:
list of directories matching specifier
- Return type:
directory_list (list)
- ensemble_files(ensemble_dirs, parallel=False)¶
Returns a list of lists of files matching
%d[::]specifier in the ensemble_dirs. The specifier is expanded and existing files are identified and returned. The directories are expected to exist.- Parameters:
ensemble_dirs (list) – list of ensemble directories to expand (with specifier)
parallel (boolean) – run in parallel with ipyparallel (default False)
- Returns:
list of list of files matching specifier
- Return type:
sim_files (list)
- files(file_spec)¶
Return a list of files matching
%d[::]specifier. The specifier is expanded and existing files are identified and returned.- Parameters:
file_spec (string) – file path with
%d[::]specifier- Returns:
list of files matching specifier
- Return type:
file_list (list)
- get_col(col)¶
Returns a column from the ensemble table.
- Parameters:
col (string) – name of column to return
- Returns:
list of contents in column
- Return type:
col_list (list)
- mirror_directories(output_dir, ensemble_dirs, over_write)¶
Creates a set of directories in the output directory which mirror the ensemble directory structure, unless output directory already exists.
- Parameters:
output_dir (string) – name of output directory to create
ensemble_dirs (list) – list of ensemble directories to mirror
over_write (boolean) – true to over-write existing directories
- Returns:
- list of mirror directories (including output_dir),
None if directories were not created
- Return type:
mirror_dirs (list)
- to_csv(file_out, output_dir='', cols=None, exc_cols=None, index=True, index_label=None)¶
Writes out the table to a .csv file.
- Parameters:
file_out (string) – name of .csv file
output_dir (string) – output directory to use for .csv file
cols (list) – list of column headers to output
exc_cols (list) – list of column headers to exclude from output
index (boolean) – write out index column
index_label (string) – use as index header
- class slypi.ensemble.utilities.EnsembleSpecifierError(specifier, message='invalid %d[::] format')¶
Exception raised for errors in %d[::] format specifier.
- Parameters:
specifier (string) – input specifier which caused the error
message (string) – explanation of the error
- class slypi.ensemble.PluginTemplate(description=None)¶
Provides an extensible architecture for accomodating different input/output formats, machine learning algorithms, and simulations.
Plugins must be defined as following in a seperate
.pyfile:- Example:
import slypi class Plugin(slypi.ensemble.PluginTemplate): ...
See
memphis.pyfor an example.- add_args()¶
Note these flags should not conflict with already used flags (see parse_args class ArgumentParser).
- Example:
# plugin adds command line argument self.parser.add_argument("--my_option", help="My option for plugin.")
- check_args(args)¶
Checks plugin arguments and raises exceptions if there are errors.
- Parameters:
args (ArgumentParser object) – processed argument list
- convert_file(file_in, file_out, file_in_type=None, file_out_type=None)¶
Converts from file_in to file_out, where file_in can be a string or a buffer. File types are inferred from extensions unless provided. Uses the meshio library.
- Parameters:
file_in (string) – name of input file
file_out (string) – name of output file
file_in_type (string) – file input format (regardless of extension)
file_out_type (string) – file output format (extension)
- convert_files(file_list, output_dir, output_type, input_type=None)¶
Converts a list of files to file of type output_type in output_dir with same root name. Input file types are inferred from extensions, unless type is provided. Output type is also inferred, unless provided.
- Parameters:
file_list (list) – list of file names to read
output_dir (string) – name of output directory to write files
output_type (string) – extension of file format for output
input_type (sring) – file input type (regardless of extension)
- Returns:
list of files written using full path
- Return type:
files_written (list)
- expand(table, header, file_list, **kwargs)¶
Expands a column in a ensemble table by reading the file links and creating files appropriate to the plugin.
- Parameters:
table (ensemble Table object) – table with column containing file links
header (string or int) – name of column to with file links
file_list (list) – list of files to expand
**kwargs – additional arguments dependent on plugin
- init(args)¶
Initialize any local variables from command line arguments.
- Parameters:
args (ArgumentParser object) – processed argument list
- Example:
self.my_var = args.my_option
- parse_args(arg_list=[])¶
Parses arguments specific to plugin.
- Parameters:
arg_list (list) – list of command line flags and arguments
- Returns:
ArgumentParser processed argument list, list of un-recognized arguments
- Return type:
args (object), unknown_args (list)
- preprocess(data)¶
Performs data pre-processing specific to a simulation. This code must be provided by the plugin, otherwise the data is returned unchanged. Note that this type of pre-processing is assumed to be per file (e.g. per time step or per simulation). Pre-processing that occurs over the entire ensemble is done by the algorithm codes (e.g. dimension reduction or proxy models).
- Parameters:
data (object) – data to be pre-processed
- Returns:
pre-processed numpy array with simulations per row
- Return type:
data_out (2d array)
- read_file(file_in, file_type=None)¶
Reads a file associated with a single time step in an ensemble. File type is inferred from extension unless provided.
- Parameters:
file_in (string) – name of file to read
file_type (string) – file input type (regardless of extension)
- Returns:
file contents
- Return type:
data (object)
- read_file_batch(batch_files, file_type=None, parallel=False, flatten=True)¶
Reads a batch of files from an ensemble. File type is inferred from extension unless provided. Can be run in parallel using ipyparallel.
- Parameters:
batch_files (list) – list of files to read
file_type (string) – file input type (regardless of extension)
parallel (boolean) – to run in parallel using ipyparallel
flatten (boolean) – flatten matrix files to vectors (defaults True)
- Returns:
file of file contents
- Return type:
data_list (list)
- read_input_deck(file_list, file_type=None)¶
Reads a file or files which provide the input parameters for a simulation. Note that this code must be provided by the plugin.
- Parameters:
file_list (list) – list of file names to read (can be a list of one file)
file_type (string) – file type (regardless of extension)
- Returns:
meta data for the simulation
- Return type:
file_data (object)
- write_file(data, file_out, file_type=None)¶
Writes time step data from an ensemble to a file. File type is inferred from extension unless provided.
- Parameters:
data (meshio mesh) – mesh data to be written
file_out (string) – file name for output file
file_type (string) – file extension
- slypi.ensemble.plugin(plugin_name, arg_list=None)¶
Factory function to instantiate a plugin module from a file and a list of command line arguments.
- Parameters:
plugin_name (string) – module name (no .py) or file name of module (ending in .py)
arg_list (list) – list containing command line flags and argumetns
- Returns:
plugin as Python namespace, list of un-recognized arguments
- Return type:
plugin (object), unknown_args (list)
- Example:
# import and initialize plugin plugin, unknown_args = slypi.ensemble.plugin(args.plugin, arg_list)
- class slypi.ensemble.algorithms.reduction.DimensionReduction(arg_list=None, model_file=None)¶
This class contains wrappers for dimension reduction algorithms in sci-kit learn for use with the slypi ensemble tools. It includes it’s own parser to specify algorithms and algorithm parameters.
- Parameters:
arg_list (list) – list of arguments to specific to reduction
model_file (string) – name of model file containing reduction
- Example:
# get parser and reduction algorithm code import slypi.ensemble as ensemble import slypi.algorithms.reduction as algorithms # parse command line my_parser = ensemble.ArgumentParser() # parse command line and start logger args, arg_list = my_parser.parse_args() # set up dimension reduction algorithm using command line arguments algorithm = algorithms.DimensionReduction(arg_list=arg_list) # use time_align to use a time-aligned model, where time_align # specifies the number of dimension to use per time step time_aligned_algorithm = algorithms.DimensionReduction(time_align=10) # set up data in variable X, data points per row # do dimension reduction (ala sklearn) algorithm.fit(X) # reduced data to lower dimension reduced_data = algorithm.transform(X)
- data_explained()¶
Returns the percent of information captured per dimension per model. For PCA this would be the explained variance ratio. If a model doesn’t compute this information, an empty list is returned.
- Returns:
list of vectors of percent captured
- Return type:
data_explained (list)
- fit(data, time_step=0)¶
Train dimension reduction model using samples.
- Parameters:
data (array) – data with points as rows
time_step (int) – train model at given time step
- has_inverse()¶
Check if user selected algorithm has an inverse.
- Returns:
True if algorithm has an inverse
- Return type:
has_inverse (bool)
- is_incremental()¶
Test if the user selected an incremental algorithm.
- Returns:
True if algorithm can be used in batch mode
- Return type:
is_incremental (bool)
- load(file_in)¶
Loads a dimension reduction model from a .pkl file.
- Parameters:
file_in (string) – file name with saved model
- num_dim()¶
Get desired number of dimensions in reduction.
- Returns:
number of dimensions for desired reduction
- Return type:
num_dim (int)
- partial_fit(data, time_step=0)¶
Train an incremental model using samples.
- Parameters:
data (array) – data with points as rows
time_step (int) – model time step
- save(file_out)¶
Saves a dimension reduction model to a .pkl file.
- Parameters:
file_out (string) – file name to save file
- time_align(data, compute_rotations=True)¶
Time align reduced data using the Kabsch algorithm. Expects the incoming dimension to be time_align_dim and truncates the dimension to num_dim after alignment.
- Args: data (list of array): list of data matrices of shape (sim, dim)
compute_rotations (boolean): False to use existing rotation matrices
- Returns:
list of data with shape (sim, reduced dim)
- Return type:
aligned_data (list of array)
- time_align_dim()¶
Get number of dimensions to use for time alignment.
- Returns:
number of time alignment dimension
- Return type:
time_align (int)
- transform(data, time_step=0)¶
Transform data to lower dimensional representation.
- Parameters:
data (numpy array) – data with points as rows
time_step (int) – transform data at given time step
- Returns:
reduced data
- Return type:
data (array)