Basic I/O#

ROOT file I/O based on uproot.reading.open(), uproot._dask.dask() and uproot.writing.writable.recreate().

Note

Readers will use the following default options for uproot.open():

object_cache = None
array_cache = None

for uproot.dask():

open_files = False

and for both:

timeout = 180

Warning

Writers will always overwrite the output file if it exists.

Todo

Test fsspec-xrootd

Todo

Test uproot.writing._dask_write.dask_write() Use dask_awkward.new_scalar_object() to return object.

TTree#

class heptools.root.io.TreeWriter(name='Events', parents=True, basket_size=..., **options)[source]#

uproot.recreate() with remote file support and TBasket size control.

Parameters:
  • name (str, optional, default='Events') – Name of tree.

  • parents (bool, optional, default=True) – Create parent directories if not exist.

  • basket_size (int, optional) – Size of TBasket. If not given, a new TBasket will be created for each extend() call.

  • **options (dict, optional) – Additional options passed to uproot.recreate().

tree#

Created TTree.

Type:

Chunk

__call__(path)[source]#

Set output path.

Parameters:

path (PathLike) – Path to output ROOT file.

Returns:

self (TreeWriter)

__enter__()[source]#

Open a temporary local ROOT file for writing.

Returns:

self (TreeWriter)

__exit__(*exc)[source]#

If no exception is raised, move the temporary file to the output path and store Chunk information to tree.

extend(data)[source]#

Extend the TTree with data using uproot.writing.writable.WritableTree.extend().

Parameters:

data (RecordLike) – Data to extend.

Returns:

self (TreeWriter)

save_metadata(name, metadata)[source]#

Save metadata to ROOT file.

Parameters:
  • name (str) – Name of metadata.

  • metadata (dict[str, UprootSupportedDtypes]) – A dictionary of metadata.

class heptools.root.io.TreeReader(branch_filter=None, transform=None, **options)[source]#

Read data from Chunk.

Parameters:
  • branch_filter (Callable[[set[str]], set[str]], optional) – A function to select branches. If not given, all branches will be read.

  • transform (Callable[[RecordLike], RecordLike], optional) – A function to transform the data after reading. If not given, no transformation will be applied.

  • **options (dict, optional) – Additional options passed to uproot.open().

arrays(source, library='ak', **options)[source]#

Read source into array.

Parameters:
Returns:

RecordLike – Data from TTree.

concat(*sources, library='ak', **options)[source]#

Read sources into one array. The branches of sources must be the same after filtering.

Todo

Add multiprocessing support.

Parameters:
  • sources (tuple[Chunk]) – One or more chunks of TTree.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • **options (dict, optional) – Additional options passed to arrays().

Returns:

RecordLike – Concatenated data from TTree.

iterate(*sources, step=..., library='ak', mode='partition', **options)[source]#

Iterate over sources.

Parameters:
  • sources (tuple[Chunk]) – One or more chunks of TTree.

  • step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the mode will be ignored.

  • library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.

  • mode (Literal['balance', 'partition'], optional, default='partition') –

    The mode to generate iteration steps.

    • mode='balance': use balance(). The length of output arrays is not guaranteed to be step but no need to concatenate.

    • mode='partition': use partition(). The length of output arrays is guaranteed to be step except for the last one but need to concatenate.

  • **options (dict, optional) – Additional options passed to arrays().

Yields:

RecordLike – A chunk of data from TTree.

dask(*sources, partition=..., library='ak')[source]#

Read sources into delayed arrays.

Parameters:
  • sources (tuple[Chunk]) – One or more chunks of TTree.

  • partition (int, optional) – If given, the sources will be splitted into smaller chunks targeting partition entries.

  • library (Literal['ak', 'np'], optional, default='ak') – The library used to represent arrays.

Returns:

DelayedRecordLike – Delayed data from TTree.

load_metadata(name, source, builtin_types=False)[source]#

Load metadata from ROOT file.

Parameters:
  • name (str) – Name of the metadata.

  • source (Chunk) – The ROOT file source.

  • builtin_types (bool, optional, default=False) – Convert numpy dtypes to builtin types.

Returns:

dict[str, UprootSupportedDtypes] – A dictionary of metadata.

dask#

heptools.root.merge.resize(path, *sources, step, chunk_size=..., writer_options=None, reader_options=None, clean_source=True, dask=False)[source]#

merge() sources into Chunk and clean() sources after merging.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • sources (tuple[Chunk]) – Chunks to merge.

  • step (int) – Number of entries to read and write in each iteration step.

  • chunk_size (int, optional) – Number of entries in each chunk. If not given, all entries will be merged into one chunk.

  • writer_options (dict, optional) – Additional options passed to TreeWriter.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • clean_source (bool, optional, default=True) – If True, remove the source chunk after moving.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

list[Chunk] or Delayed – Merged chunks.

heptools.root.merge.merge(path, *sources, step, writer_options=None, reader_options=None, dask=False)[source]#

Merge sources into one Chunk.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • sources (tuple[Chunk]) – Chunks to merge.

  • step (int) – Number of entries to read and write in each iteration step.

  • writer_options (dict, optional) – Additional options passed to TreeWriter.

  • reader_options (dict, optional) – Additional options passed to TreeReader.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

Chunk or Delayed – Merged chunk.

heptools.root.merge.clean(source, merged, dask=False)[source]#

Clean source after merging.

Parameters:
  • source (list[Chunk]) – Source chunks to be cleaned.

  • merged (list[Chunk]) – Merged chunks.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

merged (list[Chunk] or Delayed)

heptools.root.merge.move(path, source, clean_source=True, dask=False)[source]#

Move source to path.

Parameters:
  • path (PathLike) – Path to output ROOT file.

  • source (Chunk) – Source chunk to move.

  • clean_source (bool, optional, default=True) – If True, remove the source chunk after moving.

  • dask (bool, optional, default=False) – If True, return a Delayed object.

Returns:

Chunk or Delayed – Moved chunk.