SciPy

Storage

This module provides both non-persistent and persistent storage and interpolation to be used by any module in Colossus.

Basics

There are two levels of storage: all stored fields are stored in dictionaries in memory. The data can be of any type, if persistent storage is used the data must be pickleable. The persistent storage can be turned on and off by the user, both generally and for each field individually.

Each “user” of the storage module receives their own storage space and a uniquely identifying hash code that can be used to detect changes that make it necessary to reset the storage, for example changes in physical parameters to a model. The code example below shows how to set up a storage user within a class:

from colossus.utils import storage

class DemoClass():

        def __init__(self, some_parameter = 1.5):
                self.some_parameter = some_parameter
                self.su = storage.StorageUser('myModule', 'rw', self.getName, self.getHashableString, 
                                                self.reportChanges)
                self.some_data = [2.6, 9.5]
                self.su.storeObject('test_data', self.some_data, persistent = False)
                return

        def getName(self):
                return 'MyModule'

        def getHashableString(self):
                param_string = 'MyModule_%.4f' % (self.some_parameter)
                return param_string

        def reportChanges(self):
                print('Changes in this module detected, storage has been reset.')
                return

        def loadData(self):
                data = self.su.getStoredObject('test_data')
                return data

In the constructor, we have given the storage module pointers to three functions that return a unique name for this module, a hashable string, and one that should be called when changes are detected. The hashable string must change if the previously stored data is to be discarded upon a parameter change. In the class above, we discard the field some_data when a change in some_parameter is detected. The persistent parameter determines whether this object will be written to disk as part of a pickle and loaded next time the same user class (same name and same hash code) is instantiated. Let us now try loading the data:

dc = DemoClass()
print(dc.loadData())
>>> [2.6, 9.5]

Let us now change some_parameter. The data object is discarded and the load function returns None, indicating that no valid some_data for the new hash string was found:

dc.some_parameter = 1.2
print(dc.loadData())
>>> 'Changes in this module detected, storage has been reset.'
>>> None

The storage module offers native support for interpolation tables. For example, if we have stored a table of variables x and y, we can get a spline interpolator for y(x) or even a reverse interpolator for x(y) by calling:

interp_y_of_x = su.getStoredObject('xy', interpolator = True)
interp_x_of_y = su.getStoredObject('xy', interpolator = True, inverse = True)

where su is a StorageUser object. The getStoredObject() function returns None if no object is found.

Module reference

class utils.storage.StorageUser(module, persistence, func_name, func_hashstring, func_changed)

A storage user object allows access to persistent and non-persistent storage.

Parameters
module: str

The name of the module to which this user belongs. This name determines the cache sub- directory where files will be stored.

persistence: str

A combination of 'r' and 'w', e.g. 'rw' or '', indicating whether the storage is read and/or written from and to disk.

func_name: function

A function that takes no parameters and returns the name of the user class.

func_hashstring: function

A function that takes no parameters and returns a unique string identifying the user class and any of its properties that, if changed, should trigger a resetting of the storage. If the hash string changes, the storage is emptied.

func_changed: function

A function that takes no parameters and will be called if the hash string has been found to have changed (see above).

Methods

checkForChangedHash()

Check whether the properties of the user class have changed.

getHash()

Get a unique string from the user class and convert it to a hash.

getStoredObject(object_name[, interpolator, ...])

Retrieve a stored object from memory or file.

getUniqueFilename()

Create a unique filename for this storage user.

resetStorage()

Reset the storage arrays and load persistent storage from file.

storeObject(object_name, object_data[, ...])

Save an object in memory and/or file storage.

getHash()

Get a unique string from the user class and convert it to a hash.

Returns
hash: str

A string that changes if the input string is changed, but can be much shorter than the input string.

getUniqueFilename()

Create a unique filename for this storage user.

Returns
filename: str

A filename that is unique to this module, storage user name, and the properties of the user as encapsulated in its hashable string.

checkForChangedHash()

Check whether the properties of the user class have changed.

Returns
has_changed: bool

Returns True if the hash has changed compared to the last stored hash.

resetStorage()

Reset the storage arrays and load persistent storage from file.

storeObject(object_name, object_data, persistent=True)

Save an object in memory and/or file storage.

The object is written to a dictionary in memory, and also to file if persistent == True (unless persistence does not contain 'w').

Parameters
object_name: str

The name of the object by which it can be retrieved later.

object_data: any

The object; can be any picklable data type.

persistent: bool

If True, store this object on disk (if persistence is activated globally).

getStoredObject(object_name, interpolator=False, inverse=False, path=None, store_interpolator=True, store_path_data=True)

Retrieve a stored object from memory or file.

If an object is already stored in memory, return it. If not, try to load it from file, otherwise return None. If the object is a 2-dimensional table, this function can also return an interpolator. If the path parameter is passed, the file is loaded from that file path.

Parameters
object_name: str

The name of the object to be loaded.

interpolator: bool

If True, return a spline interpolator instead of the underlying table. For this to work, the object data must either be an array of dimensionality [2, n] or a tuple with three entries of the format (x, y, z[x, y]) where x and y are ascending arrays and z is of dimensionality len(x), len(y).

inverse: bool

Return an interpolator that gives x(y) instead of y(x). This parameter only works for a 1-dimensional interpolator (see interpolator above).

path: str

If not None, data is loaded from this file path (unless it has already been loaded, in which case it is found in memory).

store_interpolator: bool

If True (the default), an interpolator that has been created is temporarily stored so that it does not need to be created again.

store_path_data: bool

If True (the default), data loaded from a file defined by path is stored temporarily so that it does not need to be loaded again.

Returns
object_data: any

Returns the loaded object (any pickleable data type), or a scipy.interpolate.InterpolatedUnivariateSpline interpolator object, or None if no object was found.

utils.storage.getCacheDir(module=None)

Get a directory for the persistent caching of data. The function attempts to locate the home directory and (if necessary) create a ‘.colossus’ sub-directory. In the rare case where that fails, the location of this code file is used as a base directory.

Parameters
module: string

The name of the module that is requesting this cache directory. Each module has its own directory in order to avoid name conflicts.

Returns
path: string

The cache directory.