Tree#

class heptools.root.chunk.Chunk(source, name='Events', branches=..., num_entries=..., entry_start=..., entry_stop=..., fetch=False)[source]#

A chunk of TTree stored in a ROOT file.

Parameters:

source (PathLike or tuple[PathLike, UUID]) – Path to ROOT file with optional UUID
name (str, optional, default='Events') – Name of TTree.
branches (Iterable[str], optional) – Name of branches. If not given, read from source.
num_entries (int, optional) – Number of entries. If not given, read from source.
entry_start (int, optional) – Start entry. If not given, set to 0.
entry_stop (int, optional) – Stop entry. If not given, set to num_entries.
fetch (bool, optional, default=False) – Fetch missing metadata from source immediately after initialization.

Notes

The following special methods are implemented:

__hash__()
__eq__()
__len__()
__repr__()

path[source]#

Path to ROOT file.

Type:: EOS

uuid[source]#

UUID of ROOT file.

Type:: UUID

name[source]#

Name of TTree.

Type:: str

branches[source]#

Name of branches.

Type:: frozenset[str]

num_entries[source]#

Number of entries.

Type:: int

property entry_start[source]#

Start entry.

Type:: int

property entry_stop[source]#

Stop entry.

Type:: int

property offset[source]#

Equal to entry_start.

Type:: int

integrity()[source]#

Check and report the following:

path not exists
uuid different from file
num_entries different from file
branches not in file
entry_start out of range
entry_stop out of range

Returns:: Chunk or None – A deep copy of self with corrected metadata. If file not exists, return None.

deepcopy(**kwargs)[source]#

Parameters:: **kwargs (dict, optional) – Override entry_start, entry_stop or branches.
Returns:: Chunk – A deep copy of self.

key()[source]#

Returns:: Chunk – A deep copy of self that only keep the properties used by __hash__.

slice(start, stop)[source]#

Parameters:

start (int) – Entry start.
stop (int) – Entry stop.

Returns:

Chunk – A sliced deepcopy() of self from start + offset to stop + offset.

classmethod from_path(*paths, executor=None)[source]#

Create Chunk from paths and fetch metadata in parallel.

Parameters:

paths (tuple[tuple[str, str]) – Path to ROOT file and name of TTree.
executor (Executor, optional) – An executor with at least the map() method implemented. If not provided, the tasks will run sequentially in the current thread.

Returns:

list[Chunk] – List of chunks from paths.

classmethod common(*chunks)[source]#

Find common branches of chunks.

Parameters:: chunks (tuple[Chunk]) – Chunks to select common branches.
Returns:: list[Chunk] – Deep copies of chunks with only common branches.

classmethod partition(size, *chunks, common_branches=False)[source]#

Partition chunks into groups. The sum of entries in each group is equal to size except for the last one. The order of chunks is preserved.

Parameters:

size (int) – Size of each group.
chunks (tuple[Chunk]) – Chunks to partition.
common_branches (bool, optional, default=False) – If True, only common branches of all chunks are kept.

Yields:

list[Chunk] – A group of chunks with total entries equal to size.

classmethod balance(size, *chunks, common_branches=False)[source]#

Split chunks into smaller pieces with size entries in each. If not possible, will try to find another size minimizing the average deviation.

Parameters:

size (int) – Target number of entries in each chunk.
chunks (tuple[Chunk]) – Chunks to balance.
common_branches (bool, optional, default=False) – If True, only common branches of all chunks are kept.

Yields:

list[Chunk] – Resized chunks with about size entries in each.

to_json()[source]#

Convert self to JSON data.

Returns:: dict – JSON data.

classmethod from_json(data)[source]#

Create Chunk from JSON data.

Parameters:: data (dict) – JSON data.
Returns:: Chunk – A Chunk object from JSON data.

classmethod from_coffea_events(events)[source]#

Create Chunk when using coffea<=0.7.22.

Parameters:: events – Events generated by coffea.processor.Runner.
Returns:: Chunk – Chunk from events.

classmethod from_coffea_datasets(datasets)[source]#

Create Chunk when using coffea>=2023.12.0.

Parameters:: datasets – Datasets generated by coffea.dataset_tools.preprocess().
Returns:: dict[str, list[Chunk]] – A mapping from dataset names to lists of chunks using the partitions from datasets.

class heptools.root.chain.Chain[source]#

A TChain like object to manage multiple Chunk and Friend.

The structure of output record is given by the following pseudo code:

library='ak':

record[main.branch] = array
record[friend.name][friend.branch] = array

library='pd', 'np':

record[main.branch] = array
record[friend.branch] = array

If duplicate branches are found after rename, the one in the friend tree that appears last will be kept.

Notes

The following special methods are implemented:

__iadd__() Chunk, Friend, Chain
__add__() Chain

add_chunk(*chunks)[source]#

Add Chunk to this chain.

Parameters:: chunks (tuple[Chunk]) – Chunks to add.
Returns:: self (Chain)

add_friend(*friends, renaming=None)[source]#

Add new Friend to this chain or merge to the existing ones.

Parameters:

friends (tuple[Friend]) – Friends to add or merge.
renaming (str or Callable, optional) – If given, the branches in the friend trees will be renamed. See below for available keys.

Returns:

self (Chain)

Notes

The following keys are available for renaming:

{friend}: Friend.name
{branch}: branch name

If the renaming function returns a tuple, the data will be stored in a nested record.

copy()[source]#

Returns:: Chain – A shallow copy of self.

concat(library='ak', reader_options=None, friend_only=False)[source]#

Read all chunks and friend trees into one record.

Parameters:

library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.
reader_options (dict, optional) – Additional options passed to TreeReader.
friend_only (bool, optional, default=False) – If True, only read friend trees.

Returns:

RecordLike – Concatenated data.

iterate(step=..., library='ak', mode='partition', reader_options=None, friend_only=False)[source]#

Iterate over chunks and friend trees.

Parameters:

step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the mode will be ignored.
library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.
mode (Literal['balance', 'partition'], optional, default='partition') – The mode to generate iteration steps. See iterate() for details.
reader_options (dict, optional) – Additional options passed to TreeReader.
friend_only (bool, optional, default=False) – If True, only read friend trees.

Yields:

RecordLike – A chunk of merged data from main and friend TTree.

dask(partition=..., library='ak', reader_options=None, friend_only=False)[source]#

Read chunks and friend trees into delayed arrays.

Warning

The renaming option will be ignored when using library='ak'.

Parameters:

partition (int, optional) – If given, the sources will be splitted into smaller chunks targeting partition entries.
library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.
reader_options (dict, optional) – Additional options passed to TreeReader.
friend_only (bool, optional, default=False) – If True, only read friend trees.

Returns:

DelayedRecordLike – Delayed data from main and friend TTree.