Tree#

class heptools.root.Chunk(source=None, name='Events', branches=..., num_entries=..., entry_start=..., entry_stop=..., fetch=False)[source]#

A chunk of TTree stored in a ROOT file.

Parameters:
  • source (PathLike or tuple[PathLike, UUID]) – Path to ROOT file with optional UUID

  • name (str, optional, default='Events') – Name of TTree.

  • branches (Iterable[str], optional) – Name of branches. If not given, read from source.

  • num_entries (int, optional) – Number of entries. If not given, read from source.

  • entry_start (int, optional) – Start entry. If not given, set to 0.

  • entry_stop (int, optional) – Stop entry. If not given, set to num_entries.

  • fetch (bool, optional, default=False) – Fetch missing metadata from source immediately after initialization.

Notes

The following special methods are implemented:

  • __hash__()

  • __eq__()

  • __len__()

  • __repr__()

path: heptools.system.eos.EOS#

Path to ROOT file.

Type:

EOS

uuid: uuid.UUID#

UUID of ROOT file.

Type:

UUID

name: str#

Name of TTree.

Type:

str

branches: frozenset[str]#

Name of branches.

Type:

frozenset[str]

num_entries: int#

Number of entries.

Type:

int

property entry_start#

Start entry.

Type:

int

property entry_stop#

Stop entry.

Type:

int

property offset#

Equal to entry_start.

Type:

int

integrity()[source]#

Check and report the following:

Returns:

Chunk or None – A deep copy of self with corrected metadata. If file not exists, return None.

deepcopy(**kwargs)[source]#
Parameters:

**kwargs (dict, optional) – Override entry_start, entry_stop or branches.

Returns:

Chunk – A deep copy of self.

key()[source]#
Returns:

Chunk – A deep copy of self that only keep the properties used by __hash__.

slice(start, stop)[source]#
Parameters:
  • start (int) – Entry start.

  • stop (int) – Entry stop.

Returns:

Chunk – A sliced deepcopy() of self from start + offset to stop + offset.

classmethod from_path(*paths, executor=None)[source]#

Create Chunk from paths and fetch metadata in parallel.

Parameters:
  • paths (tuple[tuple[str, str]) – Path to ROOT file and name of TTree.

  • executor (Executor, optional) – An executor with at least the map() method implemented. If not provided, the tasks will run sequentially in the current thread.

Returns:

list[Chunk] – List of chunks from paths.

classmethod common(*chunks)[source]#

Find common branches of chunks.

Parameters:

chunks (tuple[Chunk]) – Chunks to select common branches.

Returns:

list[Chunk] – Deep copies of chunks with only common branches.

classmethod partition(size, *chunks, common_branches=False)[source]#

Partition chunks into groups. The sum of entries in each group is equal to size except for the last one. The order of chunks is preserved.

Parameters:
  • size (int) – Size of each group.

  • chunks (tuple[Chunk]) – Chunks to partition.

  • common_branches (bool, optional, default=False) – If True, only common branches of all chunks are kept.

Yields:

list[Chunk] – A group of chunks with total entries equal to size.

classmethod balance(size, *chunks, common_branches=False)[source]#

Split chunks into smaller pieces with size entries in each. If not possible, will try to find another size minimizing the average deviation.

Parameters:
  • size (int) – Target number of entries in each chunk.

  • chunks (tuple[Chunk]) – Chunks to balance.

  • common_branches (bool, optional, default=False) – If True, only common branches of all chunks are kept.

Yields:

list[Chunk] – Resized chunks with about size entries in each.

to_json()[source]#

Convert self to JSON data.

Returns:

dict – JSON data.

classmethod from_json(data)[source]#

Create Chunk from JSON data.

Parameters:

data (dict) – JSON data.

Returns:

Chunk – A Chunk object from JSON data.

classmethod from_coffea_events(events)[source]#

Create Chunk when using coffea<=0.7.22.

Parameters:

events – Events generated by coffea.processor.Runner.

Returns:

Chunk – Chunk from events.

classmethod from_coffea_datasets(datasets)[source]#

Create Chunk when using coffea>=2023.12.0.

Parameters:

datasets – Datasets generated by coffea.dataset_tools.preprocess().

Returns:

dict[str, list[Chunk]] – A mapping from dataset names to lists of chunks using the partitions from datasets.

class heptools.root.Friend(name)[source]#

A tool to create and manage a collection of addtional TBranch stored in separate ROOT files. (also known as friend TTree)

Parameters:

name (str) – Name of the collection.

Notes

The following special methods are implemented:

allow_missing#
name: str#

Name of the collection.

Type:

str

property branches#

All branches in the friend tree.

Type:

frozenset[str]

property targets#

All contiguous target chunks.

Type:

Generator[Chunk]

property n_fragments#

Number of friend tree files.

Type:

int

property n_entries#

Number of entries in the friend tree.

Type:

int

auto_dump(base_path=..., naming=..., writer_options=None, executor=None)[source]#

Automatically dump the in-memory data when add() is called. The parameters are the same as dump().

Notes

Enable the auto-dump mode by using the with statement:

>>> with friend.auto_dump():
>>>     ...
>>>     friend.add(target, data)
>>>     ...
add(target, data)[source]#

Create a friend TTree for target using data.

Parameters:
  • target (Chunk) – A chunk of TTree.

  • data (RecordLike or Chunk) – Addtional branches added to target.

arrays(target, library='ak', reader_options=None)[source]#

Fetch the friend TTree for target into array.

Parameters:
  • target (Chunk) – A chunk of TTree.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

Returns:

RecordLike – Data from friend TTree.

concat(*targets, library='ak', reader_options=None)[source]#

Fetch the friend TTree for targets into one array.

Parameters:
  • targets (tuple[Chunk]) – One or more chunks of TTree.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

Returns:

RecordLike – Concatenated data.

dask(*targets, library='ak', reader_options=None)[source]#

Fetch the friend TTree for targets as delayed arrays. The partitions will be preserved.

Parameters:
  • targets (tuple[Chunk]) – Partitions of target TTree.

  • library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

Returns:

DelayedRecordLike – Delayed arrays of entries from the friend TTree.

dump(base_path=..., naming=..., writer_options=None, executor=None)[source]#

Dump all in-memory data to ROOT files with a given naming format.

Parameters:
  • base_path (PathLike, optional) – Base path to store the dumped files. See below for details.

  • naming (str or Callable, default="{name}_{uuid}_{start}_{stop}.root") – Naming format for the dumped files. See below for details.

  • writer_options (dict, optional) – Additional options passed to TreeWriter.

  • executor (Executor, optional) – An executor with at least the submit() method implemented. If not provided, the tasks will run sequentially in the current thread.

Notes

Each dumped file will be stored in {base_path}/{naming.format{**keys}}. If base_path is not given, the corresponding target.path.parent will be used. The following keys are available:

  • {name}: name.

  • {uuid}: target.uuid

  • {tree}: target.name

  • {start}: target.entry_start

  • {stop}: target.entry_stop

  • {path0}, {path1}, … : target.path.parts in reversed order.

where the target is the one passed to add(). To apply operations beyond the built-in str.format() syntax, use a Callable instead.

Warning

The generated path is not guaranteed to be unique. If multiple chunks are dumped to the same path, the last one will overwrite the previous ones.

Examples

The naming format works as follows:

>>> friend = Friend('test')
>>> friend.add(
>>>     Chunk(
>>>         source=('root://host.1//a/b/c/target.root', uuid),
>>>         name='Events',
>>>         entry_start=100,
>>>         entry_stop=200,
>>>     ),
>>>     data
>>> )
>>> # write to root://host.1//a/b/c/test_uuid_Events_100_200_target.root
>>> friend.dump(
>>>     '{name}_{uuid}_{tree}_{start}_{stop}_{path0}')
>>> # or write to root://host.2//x/y/z/b/c/test_uuid_100_200.root
>>> friend.dump(
>>>     '{path2}/{path1}/{name}_{uuid}_{start}_{stop}.root',
>>>     'root://host.2//x/y/z/')
>>> # or write to root://host.1//a/b/c/tar_events_100_200.root
>>> def filename(**kwargs: str) -> str:
>>>     return f'{kwargs["path0"][:3]}_{kwargs["tree"]}_{kwargs["start"]}_{kwargs["stop"]}.root'.lower()
>>> friend.dump(filename)
cleanup(executor=None)[source]#

Remove invalid chunks.

Parameters:

executor (Executor, optional) – An executor with at least the submit() method implemented. If not provided, the tasks will run sequentially in the current thread

Returns:

Friend – A copy of self with invalid chunks removed.

update(paths)[source]#

Update the path of friend chunks from new paths.

Parameters:

paths (Iterable[Chunk]) – Chunks with uuid and new path.

Returns:

Friend – A copy of self with paths updated.

reset(confirm=True, executor=None)[source]#

Reset the friend tree and delete all dumped files.

Parameters:
  • confirm (bool, optional, default=True) – Confirm the deletion.

  • executor (Executor, optional) – An executor with at least the map() method implemented. If not provided, the tasks will run sequentially in the current thread

merge(step, chunk_size=..., base_path=..., naming='{name}_{uuid}_{start}_{stop}.root', reader_options=None, writer_options=None, clean=True, executor=None, transform=None, dask=False)[source]#

Merge contiguous chunks into a single file.

Warning

  • executor and dask cannot be used together.

  • If chunk_size is provided, dask will provide better parallelism.

Parameters:
  • step (int) – Number of entries to read and write in each iteration step.

  • chunk_size (int, optional) – Number of entries in each new chunk. If not given, all entries will be merged into one chunk.

  • base_path (PathLike, optional) – Base path to store the merged files. See notes of dump() for details.

  • naming (str or Callable, optional) – Naming format for the merged files. See notes of dump() for details.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • writer_options (dict, optional) – Additional options passed to TreeWriter.

  • clean (bool, optional, default=True) – If True, clean the original friend chunks after merging.

  • executor (Executor, optional) – An executor with at least the submit() method implemented.

  • transform (Callable[[ak.Array], ak.Array], optional) – A function to transform the array before writing.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

Friend or Delayed or Future[Friend] – A new friend tree with the merged chunks.

clone(base_path, naming=..., execute=False, executor=None)[source]#

Copy all chunks to a new location.

Parameters:
  • base_path (PathLike) – Base path to store the cloned files.

  • naming (str or Callable, optional) – Naming format for the cloned files. See below for details. If not given, will simply replace the common base with base_path.

  • execute (bool, optional, default=False) – If True, clone the files immediately.

  • executor (Executor, optional) – An executor with at least the map() method implemented. If not provided, the tasks will run sequentially in the current thread.

Returns:

Friend – A new friend tree with the cloned chunks.

Notes

The naming format is the same as dump(), with the following additional keys:

  • {source0}, {source1}, … : source.path.parts without suffixes in reversed order.

where the source is the chunk to be cloned.

integrity(executor=None)[source]#

Check and report the following:

  • integrity() for all target and friend chunks

  • multiple friend chunks from the same source

  • mismatch in number of entries or branches

  • gaps or overlaps between friend chunks

  • in-memory data

This method can be very expensive for large friend trees.

Parameters:

executor (Executor, optional) – An executor with at least the map() method implemented. If not provided, the tasks will run sequentially in the current thread.

to_json()[source]#

Convert self to JSON data.

Returns:

dict – JSON data.

classmethod from_json(data)[source]#

Create Friend from JSON data.

Parameters:

data (dict) – JSON data.

Returns:

Friend – A Friend object from JSON data.

copy()[source]#
Returns:

Friend – A shallow copy of self.

class heptools.root.Chain[source]#

A TChain like object to manage multiple Chunk and Friend.

The structure of output record is given by the following pseudo code:

  • library='ak':
    record[main.branch] = array
    record[friend.name][friend.branch] = array
    
  • library='pd', 'np':
    record[main.branch] = array
    record[friend.branch] = array
    

If duplicate branches are found after rename, the one in the friend tree that appears last will be kept.

Notes

The following special methods are implemented:

add_chunk(*chunks)[source]#

Add Chunk to this chain.

Parameters:

chunks (tuple[Chunk]) – Chunks to add.

Returns:

self (Chain)

add_friend(*friends, renaming=None)[source]#

Add new Friend to this chain or merge to the existing ones.

Parameters:
  • friends (tuple[Friend]) – Friends to add or merge.

  • renaming (str or Callable, optional) – If given, the branches in the friend trees will be renamed. See below for available keys.

Returns:

self (Chain)

Notes

The following keys are available for renaming:

If the renaming function returns a tuple, the data will be stored in a nested record.

copy()[source]#
Returns:

Chain – A shallow copy of self.

concat(library='ak', reader_options=None, friend_only=False)[source]#

Read all chunks and friend trees into one record.

Parameters:
  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • friend_only (bool, optional, default=False) – If True, only read friend trees.

Returns:

RecordLike – Concatenated data.

iterate(step=..., library='ak', mode='partition', reader_options=None, friend_only=False)[source]#

Iterate over chunks and friend trees.

Parameters:
  • step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the mode will be ignored.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • mode (Literal['balance', 'partition'], optional, default='partition') – The mode to generate iteration steps. See iterate() for details.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • friend_only (bool, optional, default=False) – If True, only read friend trees.

Yields:

RecordLike – A chunk of merged data from main and friend TTree.

dask(partition=..., library='ak', reader_options=None, friend_only=False)[source]#

Read chunks and friend trees into delayed arrays.

Warning

The renaming option will be ignored when using library='ak'.

Parameters:
  • partition (int, optional) – If given, the sources will be splitted into smaller chunks targeting partition entries.

  • library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • friend_only (bool, optional, default=False) – If True, only read friend trees.

Returns:

DelayedRecordLike – Delayed data from main and friend TTree.