Basic I/O#

A wrapper for ROOT file I/O uproot.reading.open(), uproot._dask.dask() and uproot.writing.writable.recreate().

Note

Readers will use the following default options for uproot.open():

object_cache = None
array_cache = None

for uproot.dask():

open_files = False

and for both:

timeout = 180

Warning

Writers will always overwrite the output file if it exists.

Todo

Use dask_awkward.new_scalar_object() to return object.

TTree#

class heptools.root.TreeWriter(name='Events', parents=True, basket_size=..., **options)[source]#

uproot.recreate() with remote file support and TBasket size control.

Parameters:
  • name (str, optional, default='Events') – Default name of tree.

  • parents (bool, optional, default=True) – Create parent directories if not exist.

  • basket_size (int, optional) – Size of TBasket. If not given, a new TBasket will be created for each extend() call.

  • **options (dict, optional) – Additional options passed to uproot.recreate().

tree: heptools.root.chunk.Chunk | list[heptools.root.chunk.Chunk]#

Created TTree.

Type:

Chunk or list[Chunk]

__call__(path, name=None)[source]#

Set output path.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • name (str, optional) – Name of tree. If given, it will temporarily override the tree name.

Returns:

self (TreeWriter)

__enter__()[source]#

Open a temporary local ROOT file for writing.

Returns:

self (TreeWriter)

__exit__(*exc)[source]#

If no exception is raised, move the temporary file to the output path and store Chunk information to tree.

extend(data)[source]#

Extend the TTree with data using uproot.writing.writable.WritableTree.extend().

Note

The VirtualArray in awkward < 2.0 will be materialized.

Parameters:

data (RecordLike) – Data to extend.

Returns:

self (TreeWriter)

switch(name)[source]#

Switch to another tree in the same ROOT file.

Warning

The buffer will be immediately flushed regardless of the basket size. It is recommended to finish the current tree before switching.

Parameters:

name (str) – Name of tree.

Returns:

self (TreeWriter)

save_metadata(name, metadata)[source]#

Save metadata to ROOT file.

Parameters:
  • name (str) – Name of metadata.

  • metadata (dict[str, UprootSupportedDtypes]) – A dictionary of metadata.

Returns:

self (TreeWriter)

class heptools.root.TreeReader(branch_filter=None, transform=None, **options)[source]#

Read data from Chunk.

Parameters:
  • branch_filter (Callable[[set[str]], set[str]], optional) – A function to select branches. If not given, all branches will be read.

  • transform (Callable[[RecordLike], RecordLike], optional) – A function to transform the data after reading. If not given, no transformation will be applied.

  • **options (dict, optional) – Additional options passed to uproot.open().

arrays(source, library='ak', **options)[source]#

Read source into array.

Parameters:
Returns:

RecordLike – Data from TTree.

concat(*sources, library='ak', **options)[source]#

Read sources into one array. The branches of sources must be the same after filtering.

Todo

Add multiprocessing support.

Parameters:
  • sources (tuple[Chunk]) – One or more chunks of TTree.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • **options (dict, optional) – Additional options passed to arrays().

Returns:

RecordLike – Concatenated data from TTree.

iterate(*sources, step=..., library='ak', mode='partition', **options)[source]#

Iterate over sources.

Parameters:
  • sources (tuple[Chunk]) – One or more chunks of TTree.

  • step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the mode will be ignored.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • mode (Literal['balance', 'partition'], optional, default='partition') –

    The mode to generate iteration steps.

    • mode='balance': use balance(). The length of output arrays is not guaranteed to be step but no need to concatenate.

    • mode='partition': use partition(). The length of output arrays is guaranteed to be step except for the last one but need to concatenate.

  • **options (dict, optional) – Additional options passed to arrays().

Yields:

RecordLike – A chunk of data from TTree.

dask(*sources, partition=..., library='ak')[source]#

Read sources into delayed arrays.

Parameters:
  • sources (tuple[Chunk]) – One or more chunks of TTree.

  • partition (int, optional) – If given, the sources will be splitted into smaller chunks targeting partition entries.

  • library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.

Returns:

DelayedRecordLike – Delayed data from TTree.

load_metadata(name, source, builtin_types=False)[source]#

Load metadata from ROOT file.

Parameters:
  • name (str) – Name of the metadata.

  • source (Chunk) – The ROOT file source.

  • builtin_types (bool, optional, default=False) – Convert numpy dtypes to builtin types.

Returns:

dict[str, UprootSupportedDtypes] – A dictionary of metadata.

dask#

heptools.root.merge.resize(path, *sources, step, chunk_size=..., writer_options=None, reader_options=None, clean_source=True, transform=None, dask=False)[source]#

merge() sources into Chunk and clean() sources after merging.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • sources (tuple[Chunk]) – Chunks to merge.

  • step (int) – Number of entries to read and write in each iteration step.

  • chunk_size (int, optional) – Number of entries in each chunk. If not given, all entries will be merged into one chunk.

  • writer_options (dict, optional) – Additional options passed to TreeWriter.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • clean_source (bool, optional, default=True) – If True, remove the source chunk after moving.

  • transform (Callable[[ak.Array], ak.Array], optional) – A function to transform the array before writing.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

list[Chunk] or Delayed – Merged chunks.

heptools.root.merge.merge(path, *sources, step, writer_options=None, reader_options=None, transform=None, dask=False)[source]#

Merge sources into one Chunk.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • sources (tuple[Chunk]) – Chunks to merge.

  • step (int) – Number of entries to read and write in each iteration step.

  • writer_options (dict, optional) – Additional options passed to TreeWriter.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • transform (Callable[[ak.Array], ak.Array], optional) – A function to transform the array before writing.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

Chunk or Delayed – Merged chunk.

heptools.root.merge.clean(source, merged, dask=False)[source]#

Clean source after merging.

Parameters:
  • source (list[Chunk]) – Source chunks to be cleaned.

  • merged (list[Chunk]) – Merged chunks.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

merged (list[Chunk] or Delayed)

heptools.root.merge.move(path, source, clean_source=True, dask=False)[source]#

Move source to path.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • source (Chunk) – Source chunk to move.

  • clean_source (bool, optional, default=True) – If True, remove the source chunk after moving.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

Chunk or Delayed – Moved chunk.