Tree#
- class heptools.root.Chunk(source=None, name='Events', branches=..., num_entries=..., entry_start=..., entry_stop=..., fetch=False)[source]#
A chunk of
TTreestored in a ROOT file.- Parameters:
source (PathLike or tuple[PathLike, UUID]) – Path to ROOT file with optional UUID
name (str, optional, default='Events') – Name of
TTree.branches (Iterable[str], optional) – Name of branches. If not given, read from
source.num_entries (int, optional) – Number of entries. If not given, read from
source.entry_start (int, optional) – Start entry. If not given, set to
0.entry_stop (int, optional) – Stop entry. If not given, set to
num_entries.fetch (bool, optional, default=False) – Fetch missing metadata from
sourceimmediately after initialization.
Notes
The following special methods are implemented:
__hash__()__eq__()__len__()__repr__()
- path: heptools.system.eos.EOS#
Path to ROOT file.
- Type:
- integrity()[source]#
Check and report the following:
pathnot existsuuiddifferent from filenum_entriesdifferent from filebranchesnot in fileentry_startout of rangeentry_stopout of range
- Returns:
Chunk or None – A deep copy of
selfwith corrected metadata. If file not exists, returnNone.
- deepcopy(**kwargs)[source]#
- Parameters:
**kwargs (dict, optional) – Override
entry_start,entry_stoporbranches.- Returns:
Chunk – A deep copy of
self.
- slice(start, stop)[source]#
- Parameters:
- Returns:
Chunk – A sliced
deepcopy()ofselffromstart+offsettostop+offset.
- classmethod from_path(*paths, executor=None)[source]#
Create
Chunkfrompathsand fetch metadata in parallel.
- classmethod partition(size, *chunks, common_branches=False)[source]#
Partition
chunksinto groups. The sum of entries in each group is equal tosizeexcept for the last one. The order of chunks is preserved.
- classmethod balance(size, *chunks, common_branches=False)[source]#
Split
chunksinto smaller pieces withsizeentries in each. If not possible, will try to find another size minimizing the average deviation.
- classmethod from_coffea_events(events)[source]#
Create
Chunkwhen usingcoffea<=0.7.22.- Parameters:
events – Events generated by
coffea.processor.Runner.- Returns:
Chunk – Chunk from
events.
- classmethod from_coffea_datasets(datasets)[source]#
Create
Chunkwhen usingcoffea>=2023.12.0.- Parameters:
datasets – Datasets generated by
coffea.dataset_tools.preprocess().- Returns:
dict[str, list[Chunk]] – A mapping from dataset names to lists of chunks using the partitions from
datasets.
- class heptools.root.Friend(name)[source]#
A tool to create and manage a collection of addtional
TBranchstored in separate ROOT files. (also known as friendTTree)- Parameters:
name (str) – Name of the collection.
Notes
The following special methods are implemented:
__iadd__()Friend__add__()Friend__repr__()__enter__(): Seeauto_dump().__exit__(): Seeauto_dump().
- allow_missing#
- auto_dump(base_path=..., naming=..., writer_options=None, executor=None)[source]#
Automatically dump the in-memory data when
add()is called. The parameters are the same asdump().Notes
Enable the auto-dump mode by using the with statement:
>>> with friend.auto_dump(): >>> ... >>> friend.add(target, data) >>> ...
- arrays(target, library='ak', reader_options=None)[source]#
Fetch the friend
TTreefortargetinto array.
- concat(*targets, library='ak', reader_options=None)[source]#
Fetch the friend
TTreefortargetsinto one array.
- dask(*targets, library='ak', reader_options=None)[source]#
Fetch the friend
TTreefortargetsas delayed arrays. The partitions will be preserved.- Parameters:
- Returns:
DelayedRecordLike – Delayed arrays of entries from the friend
TTree.
- dump(base_path=..., naming=..., writer_options=None, executor=None)[source]#
Dump all in-memory data to ROOT files with a given
namingformat.- Parameters:
base_path (PathLike, optional) – Base path to store the dumped files. See below for details.
naming (str or Callable, default="{name}_{uuid}_{start}_{stop}.root") – Naming format for the dumped files. See below for details.
writer_options (dict, optional) – Additional options passed to
TreeWriter.executor (Executor, optional) – An executor with at least the
submit()method implemented. If not provided, the tasks will run sequentially in the current thread.
Notes
Each dumped file will be stored in
{base_path}/{naming.format{**keys}}. Ifbase_pathis not given, the correspondingtarget.path.parentwill be used. The following keys are available:{name}:name.{uuid}:target.uuid{tree}:target.name{start}:target.entry_start{stop}:target.entry_stop{path0},{path1}, … :target.path.partsin reversed order.
where the
targetis the one passed toadd(). To apply operations beyond the built-instr.format()syntax, use aCallableinstead.Warning
The generated path is not guaranteed to be unique. If multiple chunks are dumped to the same path, the last one will overwrite the previous ones.
Examples
The naming format works as follows:
>>> friend = Friend('test') >>> friend.add( >>> Chunk( >>> source=('root://host.1//a/b/c/target.root', uuid), >>> name='Events', >>> entry_start=100, >>> entry_stop=200, >>> ), >>> data >>> ) >>> # write to root://host.1//a/b/c/test_uuid_Events_100_200_target.root >>> friend.dump( >>> '{name}_{uuid}_{tree}_{start}_{stop}_{path0}') >>> # or write to root://host.2//x/y/z/b/c/test_uuid_100_200.root >>> friend.dump( >>> '{path2}/{path1}/{name}_{uuid}_{start}_{stop}.root', >>> 'root://host.2//x/y/z/') >>> # or write to root://host.1//a/b/c/tar_events_100_200.root >>> def filename(**kwargs: str) -> str: >>> return f'{kwargs["path0"][:3]}_{kwargs["tree"]}_{kwargs["start"]}_{kwargs["stop"]}.root'.lower() >>> friend.dump(filename)
- merge(step, chunk_size=..., base_path=..., naming='{name}_{uuid}_{start}_{stop}.root', reader_options=None, writer_options=None, clean=True, executor=None, transform=None, dask=False)[source]#
Merge contiguous chunks into a single file.
Warning
executoranddaskcannot be used together.If
chunk_sizeis provided,daskwill provide better parallelism.
- Parameters:
step (int) – Number of entries to read and write in each iteration step.
chunk_size (int, optional) – Number of entries in each new chunk. If not given, all entries will be merged into one chunk.
base_path (PathLike, optional) – Base path to store the merged files. See notes of
dump()for details.naming (str or Callable, optional) – Naming format for the merged files. See notes of
dump()for details.reader_options (dict, optional) – Additional options passed to
TreeReader.writer_options (dict, optional) – Additional options passed to
TreeWriter.clean (bool, optional, default=True) – If
True, clean the original friend chunks after merging.executor (Executor, optional) – An executor with at least the
submit()method implemented.transform (Callable[[ak.Array], ak.Array], optional) – A function to transform the array before writing.
dask (bool, optional, default=False) – If
True, return aDelayedobject.
- Returns:
Friend or Delayed or Future[Friend] – A new friend tree with the merged chunks.
- clone(base_path, naming=..., execute=False, executor=None)[source]#
Copy all chunks to a new location.
- Parameters:
base_path (PathLike) – Base path to store the cloned files.
naming (str or Callable, optional) – Naming format for the cloned files. See below for details. If not given, will simply replace the common base with
base_path.execute (bool, optional, default=False) – If
True, clone the files immediately.executor (Executor, optional) – An executor with at least the
map()method implemented. If not provided, the tasks will run sequentially in the current thread.
- Returns:
Friend – A new friend tree with the cloned chunks.
Notes
The naming format is the same as
dump(), with the following additional keys:{source0},{source1}, … :source.path.partswithout suffixes in reversed order.
where the
sourceis the chunk to be cloned.
- integrity(executor=None)[source]#
Check and report the following:
integrity()for all target and friend chunksmultiple friend chunks from the same source
mismatch in number of entries or branches
gaps or overlaps between friend chunks
in-memory data
This method can be very expensive for large friend trees.
- class heptools.root.Chain[source]#
A
TChainlike object to manage multipleChunkandFriend.The structure of output record is given by the following pseudo code:
library='ak':record[main.branch] = array record[friend.name][friend.branch] = array
library='pd', 'np':record[main.branch] = array record[friend.branch] = array
If duplicate branches are found after rename, the one in the friend tree that appears last will be kept.
Notes
The following special methods are implemented:
- add_friend(*friends, renaming=None)[source]#
Add new
Friendto this chain or merge to the existing ones.- Parameters:
- Returns:
self (Chain)
Notes
The following keys are available for renaming:
{friend}:Friend.name{branch}: branch name
If the renaming function returns a tuple, the data will be stored in a nested record.
- concat(library='ak', reader_options=None, friend_only=False)[source]#
Read all chunks and friend trees into one record.
- Parameters:
- Returns:
RecordLike – Concatenated data.
- iterate(step=..., library='ak', mode='partition', reader_options=None, friend_only=False)[source]#
Iterate over chunks and friend trees.
- Parameters:
step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the
modewill be ignored.library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.
mode (Literal['balance', 'partition'], optional, default='partition') – The mode to generate iteration steps. See
iterate()for details.reader_options (dict, optional) – Additional options passed to
TreeReader.friend_only (bool, optional, default=False) – If
True, only read friend trees.
- Yields:
RecordLike – A chunk of merged data from main and friend
TTree.
- dask(partition=..., library='ak', reader_options=None, friend_only=False)[source]#
Read chunks and friend trees into delayed arrays.
Warning
The
renamingoption will be ignored when usinglibrary='ak'.- Parameters:
partition (int, optional) – If given, the
sourceswill be splitted into smaller chunks targetingpartitionentries.library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.
reader_options (dict, optional) – Additional options passed to
TreeReader.friend_only (bool, optional, default=False) – If
True, only read friend trees.
- Returns:
DelayedRecordLike – Delayed data from main and friend
TTree.