Tree#

class heptools.root.chunk.Chunk(source, name='Events', branches=..., num_entries=..., entry_start=..., entry_stop=..., fetch=False)[source]#

A chunk of TTree stored in a ROOT file.

Parameters:
  • source (PathLike or tuple[PathLike, UUID]) – Path to ROOT file with optional UUID

  • name (str, optional, default='Events') – Name of TTree.

  • branches (Iterable[str], optional) – Name of branches. If not given, read from source.

  • num_entries (int, optional) – Number of entries. If not given, read from source.

  • entry_start (int, optional) – Start entry. If not given, set to 0.

  • entry_stop (int, optional) – Stop entry. If not given, set to num_entries.

  • fetch (bool, optional, default=False) – Fetch missing metadata from source immediately after initialization.

Notes

The following special methods are implemented:

  • __hash__()

  • __eq__()

  • __len__()

  • __repr__()

path[source]#

Path to ROOT file.

Type:

EOS

uuid[source]#

UUID of ROOT file.

Type:

UUID

name[source]#

Name of TTree.

Type:

str

branches[source]#

Name of branches.

Type:

frozenset[str]

num_entries[source]#

Number of entries.

Type:

int

property entry_start[source]#

Start entry.

Type:

int

property entry_stop[source]#

Stop entry.

Type:

int

property offset[source]#

Equal to entry_start.

Type:

int

integrity()[source]#

Check and report the following:

Returns:

Chunk or None – A deep copy of self with corrected metadata. If file not exists, return None.

deepcopy(**kwargs)[source]#
Parameters:

**kwargs (dict, optional) – Override entry_start, entry_stop or branches.

Returns:

Chunk – A deep copy of self.

key()[source]#
Returns:

Chunk – A deep copy of self that only keep the properties used by __hash__.

slice(start, stop)[source]#
Parameters:
  • start (int) – Entry start.

  • stop (int) – Entry stop.

Returns:

Chunk – A sliced deepcopy() of self from start + offset to stop + offset.

classmethod from_path(*paths, executor=None)[source]#

Create Chunk from paths and fetch metadata in parallel.

Parameters:
  • paths (tuple[tuple[str, str]) – Path to ROOT file and name of TTree.

  • executor (Executor, optional) – An executor with at least the map() method implemented. If not provided, the tasks will run sequentially in the current thread.

Returns:

list[Chunk] – List of chunks from paths.

classmethod common(*chunks)[source]#

Find common branches of chunks.

Parameters:

chunks (tuple[Chunk]) – Chunks to select common branches.

Returns:

list[Chunk] – Deep copies of chunks with only common branches.

classmethod partition(size, *chunks, common_branches=False)[source]#

Partition chunks into groups. The sum of entries in each group is equal to size except for the last one. The order of chunks is preserved.

Parameters:
  • size (int) – Size of each group.

  • chunks (tuple[Chunk]) – Chunks to partition.

  • common_branches (bool, optional, default=False) – If True, only common branches of all chunks are kept.

Yields:

list[Chunk] – A group of chunks with total entries equal to size.

classmethod balance(size, *chunks, common_branches=False)[source]#

Split chunks into smaller pieces with size entries in each. If not possible, will try to find another size minimizing the average deviation.

Parameters:
  • size (int) – Target number of entries in each chunk.

  • chunks (tuple[Chunk]) – Chunks to balance.

  • common_branches (bool, optional, default=False) – If True, only common branches of all chunks are kept.

Yields:

list[Chunk] – Resized chunks with about size entries in each.

to_json()[source]#

Convert self to JSON data.

Returns:

dict – JSON data.

classmethod from_json(data)[source]#

Create Chunk from JSON data.

Parameters:

data (dict) – JSON data.

Returns:

Chunk – A Chunk object from JSON data.

classmethod from_coffea_events(events)[source]#

Create Chunk when using coffea<=0.7.22.

Parameters:

events – Events generated by coffea.processor.Runner.

Returns:

Chunk – Chunk from events.

classmethod from_coffea_datasets(datasets)[source]#

Create Chunk when using coffea>=2023.12.0.

Parameters:

datasets – Datasets generated by coffea.dataset_tools.preprocess().

Returns:

dict[str, list[Chunk]] – A mapping from dataset names to lists of chunks using the partitions from datasets.

class heptools.root.chain.Chain[source]#

A TChain like object to manage multiple Chunk and Friend.

The structure of output record is given by the following pseudo code:

  • library='ak':
    record[main.branch] = array
    record[friend.name][friend.branch] = array
    
  • library='pd', 'np':
    record[main.branch] = array
    record[friend.branch] = array
    

If duplicate branches are found after rename, the one in the friend tree that appears last will be kept.

Notes

The following special methods are implemented:

add_chunk(*chunks)[source]#

Add Chunk to this chain.

Parameters:

chunks (tuple[Chunk]) – Chunks to add.

Returns:

self (Chain)

add_friend(*friends, renaming=None)[source]#

Add new Friend to this chain or merge to the existing ones.

Parameters:
  • friends (tuple[Friend]) – Friends to add or merge.

  • renaming (str or Callable, optional) – If given, the branches in the friend trees will be renamed. See below for available keys.

Returns:

self (Chain)

Notes

The following keys are available for renaming:

  • {friend}: Friend.name

  • {branch}: branch name

If the renaming function returns a tuple, the data will be stored in a nested record.

copy()[source]#
Returns:

Chain – A shallow copy of self.

concat(library='ak', reader_options=None, friend_only=False)[source]#

Read all chunks and friend trees into one record.

Parameters:
  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • friend_only (bool, optional, default=False) – If True, only read friend trees.

Returns:

RecordLike – Concatenated data.

iterate(step=..., library='ak', mode='partition', reader_options=None, friend_only=False)[source]#

Iterate over chunks and friend trees.

Parameters:
  • step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the mode will be ignored.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • mode (Literal['balance', 'partition'], optional, default='partition') – The mode to generate iteration steps. See iterate() for details.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • friend_only (bool, optional, default=False) – If True, only read friend trees.

Yields:

RecordLike – A chunk of merged data from main and friend TTree.

dask(partition=..., library='ak', reader_options=None, friend_only=False)[source]#

Read chunks and friend trees into delayed arrays.

Warning

The renaming option will be ignored when using library='ak'.

Parameters:
  • partition (int, optional) – If given, the sources will be splitted into smaller chunks targeting partition entries.

  • library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • friend_only (bool, optional, default=False) – If True, only read friend trees.

Returns:

DelayedRecordLike – Delayed data from main and friend TTree.