SciPy

Python for MORIA

This file contains routines for the analysis of MORIA catalogs and merger trees.

Code examples

The main purpose of this module is to make it easier to load data from MORIA catalog or tree files, although the data format is straightforward and can also be directly loaded from the HDF5 files (see MORIA halo catalogs and merger trees for details). Unlike in the SPARTA module where the entire file contents are loaded into one dictionary, there are two functions to load data or the configuration from a MORIA file.

Basic file loading

The idea of the load() function is that the user can pass a series of fields and get back a structured array. For example, we can load a particular definition of the halo radius and the halo IDs:

from sparta_tools import moria

fn_catalog = '/Users/you/some/dir/moria_catalog.hdf5'
data = moria.load(fn_catalog, field_list = ['id', 'R200c_bnd_cat'], log_level = 1)

>>> Loading 2 fields from MORIA catalog with 8899 halos.

The data array now has a shape of the number of halos and each field can be called using its string identifier:

print(data.shape)
>>> (8899,)

print(data['id'])
>>> [1477557 1477562 1477564 ... 1470345 1482355 1474986]

print(data['R200c_bnd_cat'])
>>> [1029.2485     46.760006   75.87765  ...   46.040173   38.205635 46.040173]

We can also apply the function to tree files to get all snapshots. If we want to load a whole group of fields, we can use regular expressions:

fn_tree = '/Users/you/some/dir/moria_tree.hdf5'
d = moria.load(fn_tree, field_list = ['vmax', 'x', 'R500c*', 'Acc_Rate_1*Tdyn'], log_level = 2)

>>> Attempting to load the following fields from MORIA file:
>>>     vmax                                regex: False
>>>     x                                   regex: False
>>>     R500c*                              regex: True
>>>     Acc_Rate_1*Tdyn                     regex: False
>>> Loading 11 fields from MORIA tree with 7593 halos and 96 snapshots.
>>>     Acc_Rate_1*Tdyn                     shape: (96, 7593)
>>>     R500c_all_spa                       shape: (96, 7593)
>>>     R500c_bnd_cat                       shape: (96, 7593)
>>>     R500c_tcr_spa                       shape: (96, 7593)
>>>     parent_id_R500c_all_spa             shape: (96, 7593)
>>>     parent_id_R500c_bnd_cat             shape: (96, 7593)
>>>     parent_id_R500c_tcr_spa             shape: (96, 7593)
>>>     status_moria_hps_R500c_all_spa      shape: (96, 7593)
>>>     status_moria_hps_R500c_tcr_spa      shape: (96, 7593)
>>>     vmax                                shape: (96, 7593)
>>>     x                                   shape: (96, 7593, 3)

Note that Acc_Rate_1*Tdyn exists in the file and was thus not interpreted as a regular expression. In contrast, R500c* does not exist and was thus interpreted as a wildcard, meaning all fields related to R500c are loaded. Note that the structure of the returned array matches that in the MORIA file:

print(d.shape)
>>> (96, 7593)

The dimensions of individual fields (e.g., x) are handled within the structured array, making it easy to make cuts, for example selecting all halos at the final snapshot:

halo_data = d[-1, :]

or selecting all epochs of the first halo:

halo_history = d[0, :]

Tree loading and halo selection

The arrays we have loaded from MORIA tree files above contain zeros where halos were not valid. These halos can easily be excluded if we are loading only one snapshot:

data = moria.load(fn_tree, field_list = ['id'], a = 0.5, log_level = 1)
>>> Loading 1 fields from MORIA tree at snapshot 64, a = 0.503, loading 13161/15004 halos.

print(data.shape)
>>> (13161,)

Here, we have loaded the IDs of all halos that were alive at a = 0.5. The load function has automatically selected the correct snapshot and taken out halos that were not alive. However, this is not the same output we would get in the corresponding catalog file, because the tree includes all halos that were in a catalog file at any redshift. If we want only those halos that made the mass cut at the given redshift, we use the apply_cut parameter:

data = moria.load(fn_tree, field_list = ['id'], a = 0.5, apply_cut = True)
>>> Loading 1 fields from MORIA tree at snapshot 64, a = 0.503, loading 8267/15004 halos.

print(data.shape)       
>>> (8267,)

Now, the function has loaded fewer halos, as some halos were alive but not above the cut imposed on the MORIA run.

Manual loading

As we have seen, it is easy to use the load() function to load a specific snapshot from trees, but sometimes it may be desirable to perform this operation manually. This is relatively easy because the tree files contains a mask field. If reading the file directly, we load the mask and fields separately. For example, if we wanted to load R200c for all halos that were alive in the final snapshot, the code cool look like this:

f = h5py.File(fn_tree, 'r')
mask = np.array(f['mask_alive'][-1, :]).astype(bool)
R200c = np.array(f['R200c_tcr_spa'][-1, :])[mask]

Note the conversion to a boolean type; in the MORIA files, the masks are stored as 8-bit integers for compatibility with the rest of SPARTA. If we are using the load() function, this conversion is automatically taken care of:

data = moria.load(fn_tree, field_list = ['mask_alive', 'R200c_tcr_spa'])
mask = data['mask_alive'][-1, :]
R200c = data['R200c_tcr_spa'][-1, mask]

The R200c array now contains the radii of all halos that were alive in the final snapshot, but we do not know if those halos passed the mass (or other) cut that was imposed on the MORIA catalogs. To allow the user to impose the same cut, the mask_cut field is one only if the halo was output to the catalog at that snapshot, i.e., if it passed the cut. For example, the following code finds the halo radius for all halos that survived the cut at z = 1:

a = 0.5
data = moria.load(fn_tree, field_list = ['mask_cut', 'R200c_tcr_spa'])
cfg = moria.loadConfig(fn_tree)
snap_a = cfg['simulation']['snap_a']
snap_idx = np.argmin(np.abs(snap_a - a))
mask = data['mask_cut'][snap_idx, :]
R200c = data['R200c_tcr_spa'][snap_idx, mask]

Here, the loadConfig() function loads the rest of the contents of a MORIA file, that is, all configuration, simulation, and (if it is a catalog) snapshot properties.

Creating labels

The getLatexLabel() function can be used to produce latex-formatted labels for many common MORIA quantities (though not all, since the user can add arbitrary fields). For fields for which there is no obvious mathematical symbol, a latex-formatted string is returned. For example:

moria.getLatexLabel('acc_rate_200m_dyn')
>>> $\Gamma_{\rm dyn}$

moria.getLatexLabel('parent_id_R200c_all_spa')
>>> $\mathrm{parent\_id\_R200c\_all\_spa}$

Function documentation

load(filename, field_list[, snap_idx, a, z, ...])

Load specific fields or kinds of fields from a MORIA catalog or merger tree file.

loadConfig(filename)

Load all configuration and simulation parameters from a MORIA file into a dictionary.

getLatexLabel(q[, add_dollars])

Convert the name of a MORIA quantity to a latex-formatted label.

sparta_tools.moria.load(filename, field_list, snap_idx=None, a=None, z=None, a_tolerance=0.05, missing_snap_action='abort', return_snap_idx=False, return_mask=False, apply_cut=False, apply_cut_alive=True, use_h5py_masking=True, fail_on_missing=True, log_level=0)

Load specific fields or kinds of fields from a MORIA catalog or merger tree file.

The MORIA file format is simple: it contains one dataset per field, and the dimensions are either the number of halos (for catalog files) or the number of snapshots by the number of halos (for tree files). It is perfectly fine to directly load these fields using the h5py library. This function, however, makes it easier to generate a coherent data structure, namely a structured array where the fields carry the same string identifiers as in the file. The dimensions of the array are automatically adjusted to the overall dimensions (halos, snapshots) and to the dimensions of the individual fields (e.g., 3 for a position).

Moreover, this function allows the user to not only request specific fields but also to look for fields using regular expressions. If a requested fields contains any special characters, we interpret it as a regular expression. Note that passing invalid expressions will likely lead to errors. We except fields that exactly match one of the fields from the file, as some catalog fields may contain special characters. For example, if the user requests the field “my*field” and this field exists in the file, we only load this field but not “my_other_field”. By default, the function throws an exception if a regular field (i.e., not a special expression) could not be found in the file, but this behavior can be turned off.

Finally, if a tree file is being loaded, the user can select a particular snapshot, redshift, or scale factor to load. A mask is automatically applied so that only halos that were alive at that snapshot are loaded, or even only halos that make the catalog mass cut. If the latter behavior is invoked, the results are exactly the same as loading the same fields from the catalog file at the redshift in question (except for the ordering of the halos).

Parameters
filename: str

The path to the MORIA catalog or tree file.

field_list: array_like

A list of strings that denote either a field in the file or a regular expression.

snap_idx: int

If loading from a tree file: the index of the snapshot to be loaded. If None, all snapshots are loaded. Can be replaced by a or z, but only one of the three can be set.

a: float

If loading from a tree file: scale factor to be loaded. See snap_idx above.

z: float

If loading from a tree file: redshift to be loaded. See snap_idx above.

a_tolerance: float

If loading from a tree file: tolerance for difference in requested scale factor (whether requested as a or z) and the closest one found in the tree file. If the difference is greater than this tolerance, the missing_snap_action parameter decides what action is taken.

missing_snap_action: str

If loading from a tree file: action to be taken when the requested redshift or snapshot cannot be found. Can be abort in which case None is returned, warning in which case the closest snapshot is returned but a warning is printed (not recommended!), or error in which case the function aborts with an error message. If the snapshot was requested via a snap_idx, there is no way to return a closest match; thus, the warning option translates to error in that case.

return_snap_idx: bool

If loading from a tree file: return the snapshot index that was selected for a given redshift or scale factor. This is a non-trivial output because the redshift given by the user may not match the snapshot redshifts exactly.

return_mask: bool

If loading from a tree file: return the mask that was used to select halos at a given snapshot (whether the alive or cut mask, or none at all).

apply_cut: bool

If loading a snapshot from a tree file (see snap_idx above), this parameter determines whether the function returns all halos that were alive at the snapshot in question (if False) or only those halos that were output in the corresponding catalog, i.e., those that made the cut(s) specified in the MORIA run (if True).

apply_cut_alive: bool

If loading a snapshot from a tree file (see snap_idx above), this parameter determines whether the function returns all halos that were alive at the snapshot in question, or also halos that were not alive (which will appear as zeros in the array). The latter behavior is rarely needed, but can be useful when matching arrays from different load calls. If True, the returned array contains exactly the same number of halos as the MORIA tree file. If apply_cut is True, this parameter has no effect because all halos that are in the catalog are automatically also alive.

use_h5py_masking: bool

This parameter does not influence the output, but may influence the speed of the function. When loading a tree snapshot, we need to mask out other snapshots. This masking can be performed by the h5py library itself or on a numpy array. The latter may be slightly faster, but means that the entire field must be loaded from the file. The masking of active halos is always performed in numpy as it is much faster this way.

fail_on_missing: bool

If True, we stop if a requested field cannot be found in the file.

log_level: int

If zero, no output is generated. One is the default level of output, greater numbers lead to very detailed output.

Returns
d: array_like

A structured array containing the requested fields.

sparta_tools.moria.loadConfig(filename)

Load all configuration and simulation parameters from a MORIA file into a dictionary.

The function also returns a listing of all field names in the file.

Parameters
filename: str

The path to the MORIA catalog or tree file.

Returns
dic: dictionary

A dictionary containing the same group structure and parameters as the MORIA file.

sparta_tools.moria.getLatexLabel(q, add_dollars=True, **kwargs)

Convert the name of a MORIA quantity to a latex-formatted label.

Since MORIA files can contain user-defined quantities, there is no way to define a function that works for all possible quantities. This function encodes all halo radii, masses, peak heights and so on as well as the quantities listed in the MORIA files for the Erebos simulations. If no matching label is found, the function attempts to convert the string to a textual latex label (which can contain underscores but not spaces).

Parameters
q: str

The name of a quantity as listed in a MORIA hdf5 file.

add_dollars: bool

Whether or not to add $ signs at the beginning and end of the string.

kwargs: kwargs

Keyword args that are passed to the sparta_tools.halodef.HaloDefinition.toLabel() function (in addition to add_dollars).

Returns
label: str

A label in latex format.