SciPy

Python for SPARTA

This file contains some of the main routines for the python analysis package for SPARTA.

Code examples

The main purpose of this module is to translate SPARTA output from its HDF5 format into a python structure of dictionaries and structured arrays. In this section, we explore some of the functionality of the load() function through code examples. First, we need to import the sparta module. We also set a default filename variable:

from sparta_tools import sparta

filename = '/Users/you/some/dir/sparta.hdf5'

Basic structure

We can now attempt to execute the load function. It always returns a dictionary:

dic = sparta.load(filename)

>>> sparta.load: Loading file /Users/you/some/dir/sparta.hdf5.
>>> sparta.load: Loading 25181 halos from SPARTA file...

print(dic.keys())

>>> dict_keys(['anl_prf', 'tcr_ptl', 'halos', 'config', 'simulation'])

If we execute the load function without any parameters besides the filename, the function loads all data from the SPARTA file. For large files, this can take a long time or even exceed the available memory! In the case above, the file contained a total of 25181 halos. Here, the term “halo” means a branch in a merger tree, i.e., the history of a halo over time. This history begins when the halo is first detected by the halo finder and ends when the halo disappears, generally through merging into another, larger halo or because the simulation ends. Besides the halo data, the file seems to contain particle tracer information (the tcr_ptl sub-dictionary) as well as a density profile analysis (anl_prf). See the Introduction for an introduction to these abbreviations.

In addition, the dictionary always contains the config and simulation sub-dictionaries which have the same content as the eponymous groups in the HDF5 file (see The SPARTA HDF5 output format). Let’s explore their content a little:

for k in sorted(dic['config'].keys()):
        x = dic['config'][k]
        if not isinstance(x, dict):
                print('%-30s  %-40s' % (k, str(x)))

>>> cat_halo_jump_tol_box           0.03                                    
>>> cat_halo_jump_tol_phys          10.0                                    
>>> ...
>>> snap_path                       b'/Users/you/snapdir_%04d/snapshot_%04d.%d'
>>> snap_sim_type                   0                                       
>>> tcr_ptl_create_radius           2.0                                     
>>> tcr_ptl_delete_radius           3.0                 

The dictionary contains all user-defined configuration parameters (see Run-time configuration parameters). Note that we did not display the entries that are themselves dictionaries. Those contain the config parameters for particular Results or Analyses:

for k in sorted(dic['config'].keys()):
        x = dic['config'][k]
        if isinstance(x, dict):
                print(k)

>>> anl_prf
>>> res_oct

For example, for the orbit counting results that are contained in this example file:

for k in sorted(dic['config']['res_oct'].keys()):
        print('%-30s  %-40s' % (k, str(dic['config']['res_oct'][k])))

>>> res_oct_max_norbit              3 

Similarly, the simulation sub-dictionary contains information about the simulation that SPARTA was run on:

for k in sorted(dic['simulation'].keys()):
        print('%-20s  %-40s' % (k, str(dic['simulation'][k])))

>>> Omega_L               0.73                                    
>>> Omega_m               0.27                                    
>>> box_size              62.5                                    
>>> ...

Loading specific halos

Warning

The order of the returned halo data is the order in which the halos appear in the SPARTA file, not the order in which they are requested (if the halo_ids parameter is used).

This ordering also means that the order may vary between two otherwise identical SPARTA runs because the output ordering is non-deterministic. When comparing two files, one must always match the halo IDs.

Function documentation

load([filename, hdf5_file, halo_ids, ...])

Load the contents of a SPARTA HDF5 results file.

findHalos([filename, hdf5_file, cuts, log_level])

Find halos in a SPARTA file according to certain criteria, output them as an ID list.

matchAnalyses(anl1, anl2)

Find matches between the halo IDs of two sets of analyses and return the matched arrays.

haloIsHost(status)

Decide whether a halo was a host given a SPARTA status.

haloIsSub(status)

Decide whether a halo was a subhalo given a SPARTA status.

haloIsSubPermanently(status)

Decide whether a halo was a subhalo for more than one snapshot given a SPARTA status.

haloIsGhost(status)

Decide whether a halo was a ghost given a SPARTA status.

sparta_tools.sparta.load(filename=None, hdf5_file=None, halo_ids=None, halo_mask=None, load_halo_data=True, analyses=None, tracers=None, results=None, anl_match=None, anl_pad_unmatched=True, res_match=None, res_pad_unmatched=True, log_level=1)

Load the contents of a SPARTA HDF5 results file.

Parameters
filename: str

The path to the sparta file. Either this field of hdf5_file must not be None.

hdf5_file: HDF5 file object

Sometimes multiple load operations need to be performed, in which case the user may prefer not to keep opening and closing the HDF5 file. If a valid file object is passed, that file object is used and the filename parameter is ignored.

halo_ids: array_like

If this field is None, the results for all halos are loaded. If it contains the catalog IDs of one or multiple halos (at any snapshot!), only the results for those halos will be loaded. The order of the returned halos may not be the same as the order of this input list!

halo_mask: array_like

If this field is None, the results for all halos are loaded (unless a selection is made with the halo_ids parameter instead). If not None, the parameter must be a numpy array with n_halos entries, where True means a halo is loaded. Such an array can, for example, be generated with the findIDs() function, and speeds up loading because the IDs do not have to be searched in the halo ID array.

load_halo_data: bool

If True, the properties of halos (such as the histories of their ID, radius, and status) will be loaded, otherwise they will be omitted.

analyses: array_like

If None, all analyses are loaded. Otherwise a list of analysis names to load, with the names corresponding to the abbreviated analysis names used in the sparta results file (e.g. “rsp”, see the introduction section of the documentation).

tracers: array_like

If None, all tracers are loaded. Otherwise a list of tracer names to load, with the names corresponding to the abbreviated tracer names used in the sparta results file (e.g. “ptl” or “sho”, see the introduction section of the documentation).

results: array_like

If None, all tracer results are loaded. Otherwise a list of result names to load, with the names corresponding to the abbreviated result names used in the sparta results file (e.g. “ifl” or “sbk”, see the introduction section of the documentation).

anl_match: array_like

If None, no matching is performed. Otherwise, the parameter can be the name of one analysis (e.g. ‘rsp’) or a list of analyses (e.g. [‘rsp’]) for which matching is performed. This means that the halo and analysis arrays have the same dimension, i.e. that each halo is assigned exactly one analysis of each of the given types. If there are more than one of an analysis, the first one is returned. The anl_pad_unmatched parameter determines what happens when halos do not have an analysis.

anl_pad_unmatched: bool

If we are matching analyses (see above), we can either discard halos that do not have the analyses in question (anl_pad_unmatched = False), or we can pad the analysis arrays with empty elements (anl_pad_unmatched = True). Such padded elements can easily be identified by their ID field which is halo_id = -1.

res_match: array_like

If None, no matching is performed. Otherwise, the parameter must be a list of results with at least two elements, e.g. res_match = ['ifl', 'sbk']. In that case, the listed results are matched by their tracer ID. Since all result arrays are sorted by tracer ID, the matched arrays will naturally be in the same order. The res_pad_unmatched parameter determines what happens to results that do not have a counterpart. If a tracer does not have one of the matched results, no matching is performed and a warning is output.

res_pad_unmatched: bool

If matching results (see above), we can either discard (res_pad_unmatched = False) all unmatched results (i.e. results from tracers that do not have all of the matched types), or we can keep them (res_pad_unmatched = True). In the latter case, in order to maintain the synched array ordering of the result arrays, we need to insert void records where results are missing. The void records can easily be identified by their tracer_id = -1 value.

log_level: int

If zero, no output is generated. One is the default level of output, greater numbers lead to very detailed output including timing information.

Returns
dic: dictionary

A dictionary containing essentially the same file structure as the HDF5 file, depending on the options chosen.

sparta_tools.sparta.findHalos(filename=None, hdf5_file=None, cuts=[], log_level=1)

Find halos in a SPARTA file according to certain criteria, output them as an ID list.

The result of this function can be used as input to other functions such as load(). By default, the function tries to find the quantity passed in each cut in the structured halo array in the SPARTA file. Some quantities are automatically generated, namely M200m (from R200m) and N200m (from M200m). Each cut dictionary must contain the following entries:

  • q: the identifier of the quantity to be cut on, e.g. R200m

  • min: the minimum value of this quantity

  • max: the maximum value of this quantity

  • possible parameters:

    • a / t / z / snap: the time where the cut is considered

    • a_max / t_max / z_max / snap_max: if passed, make a cut between t and t_max

A special case is a cut on the halo status, that is, whether a halo was a host or subhalo or ghost. In that case, the keyword include must be in the dictionary, and contain a list of statuses to exclude which can be hosts, subs, or ghosts.

Parameters
filename: str

The path to the sparta file. Either this field of hdf5_file must not be None.

hdf5_file: HDF5 file object

Sometimes multiple load operations need to be performed, in which case the user may prefer not to keep opening and closing the HDF5 file. If a valid file object is passed, that file object is used and the filename parameter is ignored.

cuts: array_like

A list of dictionaries, where each entry corresponds to a cut.

log_level: int

Output level

Returns
ids: array_like

A list of halo IDs.

mask: array_like

A boolean array of dimension n_halos which can be used to speed up the load() function.

sparta_tools.sparta.matchAnalyses(anl1, anl2)

Find matches between the halo IDs of two sets of analyses and return the matched arrays.

The order of halos in SPARTA arrays is, essentially, random because it depends on the processes to which halos are assigned, and thus the computing architecture. When comparing the results of two SPARTA runs, we must match the halo IDs. This function returns matched arrays of analyses, their size may or may not be equal to the size of the input arrays depending on whether all analyses have matches or not.

Parameters
anl1: structured array

A set of halo analyses as returned by the load() function. For example, if dic was returned by the load() function, the profiles analysis (if it exists) can be found in dic['anl_prf'] which is a structured array that can serve as input to this function.

anl2: structured array

See above, for a second SPARTA file.

Returns
anl1_matched: structured array

The analyses in anl1 that have matches in anl2.

anl2_matched: structured array

The analyses in anl2 that have matches in anl1, in the same order as the anl1_matched returned.

sparta_tools.sparta.haloIsHost(status)

Decide whether a halo was a host given a SPARTA status.

This function refers to the status field output in the halos group in SPARTA output file, or the sparta_status field in MORIA output files. The final_status fields take on different meanings and cannot be evaluated with this function.

Parameters
status: array_like

One integer or a numpy array of integers indicating a SPARTA status..

Returns
is_host: array_like

Boolean number or array with the same dimensions as status, True if the status indicates that a halo was a host.

sparta_tools.sparta.haloIsSub(status)

Decide whether a halo was a subhalo given a SPARTA status.

This function refers to the status field output in the halos group in SPARTA output file, or the sparta_status field in MORIA output files. The final_status fields take on different meanings and cannot be evaluated with this function.

Parameters
status: array_like

One integer or a numpy array of integers indicating a SPARTA status..

Returns
is_sub: array_like

Boolean number or array with the same dimensions as status, True if the status indicates that a halo was a subhalo.

sparta_tools.sparta.haloIsSubPermanently(status)

Decide whether a halo was a subhalo for more than one snapshot given a SPARTA status.

The distinction whether a halo is a subhalo for one or multiple snapshots may seem insignificant, but SPARTA treats fly-through events where a halo is a sub for only one snapshot somewhat differently.

This function refers to the status field output in the halos group in SPARTA output file, or the sparta_status field in MORIA output files. The final_status fields take on different meanings and cannot be evaluated with this function.

Parameters
status: array_like

One integer or a numpy array of integers indicating a SPARTA status..

Returns
is_sub_permanently: array_like

Boolean number or array with the same dimensions as status, True if the status indicates that a halo was a subhalo for more than one snapshot.

sparta_tools.sparta.haloIsGhost(status)

Decide whether a halo was a ghost given a SPARTA status.

This function refers to the status field output in the halos group in SPARTA output file, or the sparta_status field in MORIA output files. The final_status fields take on different meanings and cannot be evaluated with this function.

Parameters
status: array_like

One integer or a numpy array of integers indicating a SPARTA status..

Returns
is_sub: array_like

Boolean number or array with the same dimensions as status, True if the status indicates that a halo was a ghost.