Python for MORIA¶
This file contains routines for the analysis of MORIA catalogs and merger trees.
Code examples¶
The main purpose of this module is to make it easier to load data from MORIA catalog or tree files, although the data format is straightforward and can also be directly loaded from the HDF5 files (see MORIA halo catalogs and merger trees for details). Unlike in the SPARTA module where the entire file contents are loaded into one dictionary, there are two functions to load data or the configuration from a MORIA file.
Basic file loading
The idea of the load()
function is that the user can pass a series of fields and get back a
structured array. For example, we can load a particular definition of the halo radius and the halo
IDs:
from sparta_tools import moria
fn_catalog = '/Users/you/some/dir/moria_catalog.hdf5'
data = moria.load(fn_catalog, field_list = ['id', 'R200c_bnd_cat'], log_level = 1)
>>> Loading 2 fields from MORIA catalog with 8899 halos.
The data array now has a shape of the number of halos and each field can be called using its string identifier:
print(data.shape)
>>> (8899,)
print(data['id'])
>>> [1477557 1477562 1477564 ... 1470345 1482355 1474986]
print(data['R200c_bnd_cat'])
>>> [1029.2485 46.760006 75.87765 ... 46.040173 38.205635 46.040173]
We can also apply the function to tree files to get all snapshots. If we want to load a whole group of fields, we can use regular expressions:
fn_tree = '/Users/you/some/dir/moria_tree.hdf5'
d = moria.load(fn_tree, field_list = ['vmax', 'x', 'R500c*', 'Acc_Rate_1*Tdyn'], log_level = 2)
>>> Attempting to load the following fields from MORIA file:
>>> vmax regex: False
>>> x regex: False
>>> R500c* regex: True
>>> Acc_Rate_1*Tdyn regex: False
>>> Loading 11 fields from MORIA tree with 7593 halos and 96 snapshots.
>>> Acc_Rate_1*Tdyn shape: (96, 7593)
>>> R500c_all_spa shape: (96, 7593)
>>> R500c_bnd_cat shape: (96, 7593)
>>> R500c_tcr_spa shape: (96, 7593)
>>> parent_id_R500c_all_spa shape: (96, 7593)
>>> parent_id_R500c_bnd_cat shape: (96, 7593)
>>> parent_id_R500c_tcr_spa shape: (96, 7593)
>>> status_moria_hps_R500c_all_spa shape: (96, 7593)
>>> status_moria_hps_R500c_tcr_spa shape: (96, 7593)
>>> vmax shape: (96, 7593)
>>> x shape: (96, 7593, 3)
Note that Acc_Rate_1*Tdyn
exists in the file and was thus not interpreted as a regular
expression. In contrast, R500c*
does not exist and was thus interpreted as a wildcard,
meaning all fields related to R500c are loaded. Note that the structure of the returned array
matches that in the MORIA file:
print(d.shape)
>>> (96, 7593)
The dimensions of individual fields (e.g., x
) are handled within the structured array, making
it easy to make cuts, for example selecting all halos at the final snapshot:
halo_data = d[-1, :]
or selecting all epochs of the first halo:
halo_history = d[0, :]
Tree loading and halo selection
The arrays we have loaded from MORIA tree files above contain zeros where halos were not valid. These halos can easily be excluded if we are loading only one snapshot:
data = moria.load(fn_tree, field_list = ['id'], a = 0.5, log_level = 1)
>>> Loading 1 fields from MORIA tree at snapshot 64, a = 0.503, loading 13161/15004 halos.
print(data.shape)
>>> (13161,)
Here, we have loaded the IDs of all halos that were alive at a = 0.5. The load function has
automatically selected the correct snapshot and taken out halos that were not alive. However,
this is not the same output we would get in the corresponding catalog file, because the tree
includes all halos that were in a catalog file at any redshift. If we want only those halos that
made the mass cut at the given redshift, we use the apply_cut
parameter:
data = moria.load(fn_tree, field_list = ['id'], a = 0.5, apply_cut = True)
>>> Loading 1 fields from MORIA tree at snapshot 64, a = 0.503, loading 8267/15004 halos.
print(data.shape)
>>> (8267,)
Now, the function has loaded fewer halos, as some halos were alive but not above the cut imposed on the MORIA run.
Manual loading
As we have seen, it is easy to use the load()
function to load a specific snapshot from
trees, but sometimes it may be desirable to perform this operation manually. This is relatively
easy because the tree files contains a mask field. If reading the file directly, we load the mask
and fields separately. For example, if we wanted to load R200c for all halos that were alive in the
final snapshot, the code cool look like this:
f = h5py.File(fn_tree, 'r')
mask = np.array(f['mask_alive'][-1, :]).astype(bool)
R200c = np.array(f['R200c_tcr_spa'][-1, :])[mask]
Note the conversion to a boolean type; in the MORIA files, the masks are stored as 8-bit integers
for compatibility with the rest of SPARTA. If we are using the load()
function, this
conversion is automatically taken care of:
data = moria.load(fn_tree, field_list = ['mask_alive', 'R200c_tcr_spa'])
mask = data['mask_alive'][-1, :]
R200c = data['R200c_tcr_spa'][-1, mask]
The R200c array now contains the radii of all halos that were alive in the final snapshot, but we
do not know if those halos passed the mass (or other) cut that was imposed on the MORIA catalogs.
To allow the user to impose the same cut, the mask_cut
field is one only if the halo was
output to the catalog at that snapshot, i.e., if it passed the cut. For example, the following code
finds the halo radius for all halos that survived the cut at z = 1:
a = 0.5
data = moria.load(fn_tree, field_list = ['mask_cut', 'R200c_tcr_spa'])
cfg = moria.loadConfig(fn_tree)
snap_a = cfg['simulation']['snap_a']
snap_idx = np.argmin(np.abs(snap_a - a))
mask = data['mask_cut'][snap_idx, :]
R200c = data['R200c_tcr_spa'][snap_idx, mask]
Here, the loadConfig()
function loads the rest of the contents of a MORIA file, that is, all
configuration, simulation, and (if it is a catalog) snapshot properties.
Creating labels
The getLatexLabel()
function can be used to produce latex-formatted labels for many common
MORIA quantities (though not all, since the user can add arbitrary fields). For fields for which
there is no obvious mathematical symbol, a latex-formatted string is returned. For example:
moria.getLatexLabel('acc_rate_200m_dyn')
>>> $\Gamma_{\rm dyn}$
moria.getLatexLabel('parent_id_R200c_all_spa')
>>> $\mathrm{parent\_id\_R200c\_all\_spa}$
Function documentation¶
|
Load specific fields or kinds of fields from a MORIA catalog or merger tree file. |
|
Load all configuration and simulation parameters from a MORIA file into a dictionary. |
|
Convert the name of a MORIA quantity to a latex-formatted label. |
- sparta_tools.moria.load(filename, field_list, snap_idx=None, a=None, z=None, a_tolerance=0.05, missing_snap_action='abort', return_snap_idx=False, return_mask=False, apply_cut=False, apply_cut_alive=True, use_h5py_masking=True, fail_on_missing=True, log_level=0)¶
Load specific fields or kinds of fields from a MORIA catalog or merger tree file.
The MORIA file format is simple: it contains one dataset per field, and the dimensions are either the number of halos (for catalog files) or the number of snapshots by the number of halos (for tree files). It is perfectly fine to directly load these fields using the h5py library. This function, however, makes it easier to generate a coherent data structure, namely a structured array where the fields carry the same string identifiers as in the file. The dimensions of the array are automatically adjusted to the overall dimensions (halos, snapshots) and to the dimensions of the individual fields (e.g., 3 for a position).
Moreover, this function allows the user to not only request specific fields but also to look for fields using regular expressions. If a requested fields contains any special characters, we interpret it as a regular expression. Note that passing invalid expressions will likely lead to errors. We except fields that exactly match one of the fields from the file, as some catalog fields may contain special characters. For example, if the user requests the field “my*field” and this field exists in the file, we only load this field but not “my_other_field”. By default, the function throws an exception if a regular field (i.e., not a special expression) could not be found in the file, but this behavior can be turned off.
Finally, if a tree file is being loaded, the user can select a particular snapshot, redshift, or scale factor to load. A mask is automatically applied so that only halos that were alive at that snapshot are loaded, or even only halos that make the catalog mass cut. If the latter behavior is invoked, the results are exactly the same as loading the same fields from the catalog file at the redshift in question (except for the ordering of the halos).
- Parameters
- filename: str
The path to the MORIA catalog or tree file.
- field_list: array_like
A list of strings that denote either a field in the file or a regular expression.
- snap_idx: int
If loading from a tree file: the index of the snapshot to be loaded. If
None
, all snapshots are loaded. Can be replaced bya
orz
, but only one of the three can be set.- a: float
If loading from a tree file: scale factor to be loaded. See
snap_idx
above.- z: float
If loading from a tree file: redshift to be loaded. See
snap_idx
above.- a_tolerance: float
If loading from a tree file: tolerance for difference in requested scale factor (whether requested as
a
orz
) and the closest one found in the tree file. If the difference is greater than this tolerance, themissing_snap_action
parameter decides what action is taken.- missing_snap_action: str
If loading from a tree file: action to be taken when the requested redshift or snapshot cannot be found. Can be
abort
in which caseNone
is returned,warning
in which case the closest snapshot is returned but a warning is printed (not recommended!), orerror
in which case the function aborts with an error message. If the snapshot was requested via asnap_idx
, there is no way to return a closest match; thus, thewarning
option translates toerror
in that case.- return_snap_idx: bool
If loading from a tree file: return the snapshot index that was selected for a given redshift or scale factor. This is a non-trivial output because the redshift given by the user may not match the snapshot redshifts exactly.
- return_mask: bool
If loading from a tree file: return the mask that was used to select halos at a given snapshot (whether the alive or cut mask, or none at all).
- apply_cut: bool
If loading a snapshot from a tree file (see
snap_idx
above), this parameter determines whether the function returns all halos that were alive at the snapshot in question (ifFalse
) or only those halos that were output in the corresponding catalog, i.e., those that made the cut(s) specified in the MORIA run (ifTrue
).- apply_cut_alive: bool
If loading a snapshot from a tree file (see
snap_idx
above), this parameter determines whether the function returns all halos that were alive at the snapshot in question, or also halos that were not alive (which will appear as zeros in the array). The latter behavior is rarely needed, but can be useful when matching arrays from different load calls. IfTrue
, the returned array contains exactly the same number of halos as the MORIA tree file. Ifapply_cut
isTrue
, this parameter has no effect because all halos that are in the catalog are automatically also alive.- use_h5py_masking: bool
This parameter does not influence the output, but may influence the speed of the function. When loading a tree snapshot, we need to mask out other snapshots. This masking can be performed by the h5py library itself or on a numpy array. The latter may be slightly faster, but means that the entire field must be loaded from the file. The masking of active halos is always performed in numpy as it is much faster this way.
- fail_on_missing: bool
If
True
, we stop if a requested field cannot be found in the file.- log_level: int
If zero, no output is generated. One is the default level of output, greater numbers lead to very detailed output.
- Returns
- d: array_like
A structured array containing the requested fields.
- sparta_tools.moria.loadConfig(filename)¶
Load all configuration and simulation parameters from a MORIA file into a dictionary.
The function also returns a listing of all field names in the file.
- Parameters
- filename: str
The path to the MORIA catalog or tree file.
- Returns
- dic: dictionary
A dictionary containing the same group structure and parameters as the MORIA file.
- sparta_tools.moria.getLatexLabel(q, add_dollars=True, **kwargs)¶
Convert the name of a MORIA quantity to a latex-formatted label.
Since MORIA files can contain user-defined quantities, there is no way to define a function that works for all possible quantities. This function encodes all halo radii, masses, peak heights and so on as well as the quantities listed in the MORIA files for the Erebos simulations. If no matching label is found, the function attempts to convert the string to a textual latex label (which can contain underscores but not spaces).
- Parameters
- q: str
The name of a quantity as listed in a MORIA hdf5 file.
- add_dollars: bool
Whether or not to add $ signs at the beginning and end of the string.
- kwargs: kwargs
Keyword args that are passed to the
sparta_tools.halodef.HaloDefinition.toLabel()
function (in addition toadd_dollars
).
- Returns
- label: str
A label in latex format.