Tree#
- class heptools.root.Chunk(source=None, name='Events', branches=..., num_entries=..., entry_start=..., entry_stop=..., fetch=False)[source]#
A chunk of
TTree
stored in a ROOT file.- Parameters:
source (PathLike or tuple[PathLike, UUID]) – Path to ROOT file with optional UUID
name (str, optional, default='Events') – Name of
TTree
.branches (Iterable[str], optional) – Name of branches. If not given, read from
source
.num_entries (int, optional) – Number of entries. If not given, read from
source
.entry_start (int, optional) – Start entry. If not given, set to
0
.entry_stop (int, optional) – Stop entry. If not given, set to
num_entries
.fetch (bool, optional, default=False) – Fetch missing metadata from
source
immediately after initialization.
Notes
The following special methods are implemented:
__hash__()
__eq__()
__len__()
__repr__()
- path: heptools.system.eos.EOS#
Path to ROOT file.
- Type:
- integrity()[source]#
Check and report the following:
path
not existsuuid
different from filenum_entries
different from filebranches
not in fileentry_start
out of rangeentry_stop
out of range
- Returns:
Chunk or None – A deep copy of
self
with corrected metadata. If file not exists, returnNone
.
- deepcopy(**kwargs)[source]#
- Parameters:
**kwargs (dict, optional) – Override
entry_start
,entry_stop
orbranches
.- Returns:
Chunk – A deep copy of
self
.
- slice(start, stop)[source]#
- Parameters:
- Returns:
Chunk – A sliced
deepcopy()
ofself
fromstart
+offset
tostop
+offset
.
- classmethod from_path(*paths, executor=None)[source]#
Create
Chunk
frompaths
and fetch metadata in parallel.
- classmethod partition(size, *chunks, common_branches=False)[source]#
Partition
chunks
into groups. The sum of entries in each group is equal tosize
except for the last one. The order of chunks is preserved.
- classmethod balance(size, *chunks, common_branches=False)[source]#
Split
chunks
into smaller pieces withsize
entries in each. If not possible, will try to find another size minimizing the average deviation.
- class heptools.root.Friend(name)[source]#
A tool to create and manage a collection of addtional
TBranch
stored in separate ROOT files. (also known as friendTTree
)- Parameters:
name (str) – Name of the collection.
Notes
The following special methods are implemented:
__iadd__()
Friend
__add__()
Friend
__repr__()
__enter__()
: Seeauto_dump()
.__exit__()
: Seeauto_dump()
.
- allow_missing#
- auto_dump(base_path=..., naming=..., writer_options=None, executor=None)[source]#
Automatically dump the in-memory data when
add()
is called. The parameters are the same asdump()
.Notes
Enable the auto-dump mode by using the with statement:
>>> with friend.auto_dump(): >>> ... >>> friend.add(target, data) >>> ...
- arrays(target, library='ak', reader_options=None)[source]#
Fetch the friend
TTree
fortarget
into array.
- concat(*targets, library='ak', reader_options=None)[source]#
Fetch the friend
TTree
fortargets
into one array.
- dask(*targets, library='ak', reader_options=None)[source]#
Fetch the friend
TTree
fortargets
as delayed arrays. The partitions will be preserved.- Parameters:
- Returns:
DelayedRecordLike – Delayed arrays of entries from the friend
TTree
.
- dump(base_path=..., naming=..., writer_options=None, executor=None)[source]#
Dump all in-memory data to ROOT files with a given
naming
format.- Parameters:
base_path (PathLike, optional) – Base path to store the dumped files. See below for details.
naming (str or Callable, default="{name}_{uuid}_{start}_{stop}.root") – Naming format for the dumped files. See below for details.
writer_options (dict, optional) – Additional options passed to
TreeWriter
.executor (Executor, optional) – An executor with at least the
submit()
method implemented. If not provided, the tasks will run sequentially in the current thread.
Notes
Each dumped file will be stored in
{base_path}/{naming.format{**keys}}
. Ifbase_path
is not given, the correspondingtarget.path.parent
will be used. The following keys are available:{name}
:name
.{uuid}
:target.uuid
{tree}
:target.name
{start}
:target.entry_start
{stop}
:target.entry_stop
{path0}
,{path1}
, … :target.path.parts
in reversed order.
where the
target
is the one passed toadd()
. To apply operations beyond the built-instr.format()
syntax, use aCallable
instead.Warning
The generated path is not guaranteed to be unique. If multiple chunks are dumped to the same path, the last one will overwrite the previous ones.
Examples
The naming format works as follows:
>>> friend = Friend('test') >>> friend.add( >>> Chunk( >>> source=('root://host.1//a/b/c/target.root', uuid), >>> name='Events', >>> entry_start=100, >>> entry_stop=200, >>> ), >>> data >>> ) >>> # write to root://host.1//a/b/c/test_uuid_Events_100_200_target.root >>> friend.dump( >>> '{name}_{uuid}_{tree}_{start}_{stop}_{path0}') >>> # or write to root://host.2//x/y/z/b/c/test_uuid_100_200.root >>> friend.dump( >>> '{path2}/{path1}/{name}_{uuid}_{start}_{stop}.root', >>> 'root://host.2//x/y/z/') >>> # or write to root://host.1//a/b/c/tar_events_100_200.root >>> def filename(**kwargs: str) -> str: >>> return f'{kwargs["path0"][:3]}_{kwargs["tree"]}_{kwargs["start"]}_{kwargs["stop"]}.root'.lower() >>> friend.dump(filename)
- merge(step, chunk_size=..., base_path=..., naming='{name}_{uuid}_{start}_{stop}.root', reader_options=None, writer_options=None, clean=True, executor=None, transform=None, dask=False)[source]#
Merge contiguous chunks into a single file.
Warning
executor
anddask
cannot be used together.If
chunk_size
is provided,dask
will provide better parallelism.
- Parameters:
step (int) – Number of entries to read and write in each iteration step.
chunk_size (int, optional) – Number of entries in each new chunk. If not given, all entries will be merged into one chunk.
base_path (PathLike, optional) – Base path to store the merged files. See notes of
dump()
for details.naming (str or Callable, optional) – Naming format for the merged files. See notes of
dump()
for details.reader_options (dict, optional) – Additional options passed to
TreeReader
.writer_options (dict, optional) – Additional options passed to
TreeWriter
.clean (bool, optional, default=True) – If
True
, clean the original friend chunks after merging.executor (Executor, optional) – An executor with at least the
submit()
method implemented.transform (Callable[[ak.Array], ak.Array], optional) – A function to transform the array before writing.
dask (bool, optional, default=False) – If
True
, return aDelayed
object.
- Returns:
Friend or Delayed or Future[Friend] – A new friend tree with the merged chunks.
- clone(base_path, naming=..., execute=False, executor=None)[source]#
Copy all chunks to a new location.
- Parameters:
base_path (PathLike) – Base path to store the cloned files.
naming (str or Callable, optional) – Naming format for the cloned files. See below for details. If not given, will simply replace the common base with
base_path
.execute (bool, optional, default=False) – If
True
, clone the files immediately.executor (Executor, optional) – An executor with at least the
map()
method implemented. If not provided, the tasks will run sequentially in the current thread.
- Returns:
Friend – A new friend tree with the cloned chunks.
Notes
The naming format is the same as
dump()
, with the following additional keys:{source0}
,{source1}
, … :source.path.parts
without suffixes in reversed order.
where the
source
is the chunk to be cloned.
- integrity(executor=None)[source]#
Check and report the following:
integrity()
for all target and friend chunksmultiple friend chunks from the same source
mismatch in number of entries or branches
gaps or overlaps between friend chunks
in-memory data
This method can be very expensive for large friend trees.
- class heptools.root.Chain[source]#
A
TChain
like object to manage multipleChunk
andFriend
.The structure of output record is given by the following pseudo code:
library='ak'
:record[main.branch] = array record[friend.name][friend.branch] = array
library='pd', 'np'
:record[main.branch] = array record[friend.branch] = array
If duplicate branches are found after rename, the one in the friend tree that appears last will be kept.
Notes
The following special methods are implemented:
- add_friend(*friends, renaming=None)[source]#
Add new
Friend
to this chain or merge to the existing ones.- Parameters:
- Returns:
self (Chain)
Notes
The following keys are available for renaming:
{friend}
:Friend.name
{branch}
: branch name
If the renaming function returns a tuple, the data will be stored in a nested record.
- concat(library='ak', reader_options=None, friend_only=False)[source]#
Read all chunks and friend trees into one record.
- Parameters:
- Returns:
RecordLike – Concatenated data.
- iterate(step=..., library='ak', mode='partition', reader_options=None, friend_only=False)[source]#
Iterate over chunks and friend trees.
- Parameters:
step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the
mode
will be ignored.library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.
mode (Literal['balance', 'partition'], optional, default='partition') – The mode to generate iteration steps. See
iterate()
for details.reader_options (dict, optional) – Additional options passed to
TreeReader
.friend_only (bool, optional, default=False) – If
True
, only read friend trees.
- Yields:
RecordLike – A chunk of merged data from main and friend
TTree
.
- dask(partition=..., library='ak', reader_options=None, friend_only=False)[source]#
Read chunks and friend trees into delayed arrays.
Warning
The
renaming
option will be ignored when usinglibrary='ak'
.- Parameters:
partition (int, optional) – If given, the
sources
will be splitted into smaller chunks targetingpartition
entries.library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.
reader_options (dict, optional) – Additional options passed to
TreeReader
.friend_only (bool, optional, default=False) – If
True
, only read friend trees.
- Returns:
DelayedRecordLike – Delayed data from main and friend
TTree
.