Basic I/O#
ROOT file I/O based on uproot.reading.open()
, uproot._dask.dask()
and uproot.writing.writable.recreate()
.
Note
Readers will use the following default options for uproot.open()
:
object_cache = None
array_cache = None
for uproot.dask()
:
open_files = False
and for both:
timeout = 180
Warning
Writers will always overwrite the output file if it exists.
Todo
Test fsspec-xrootd
Todo
Test uproot.writing._dask_write.dask_write()
Use dask_awkward.new_scalar_object()
to return object.
TTree
#
- class heptools.root.io.TreeWriter(name='Events', parents=True, basket_size=..., **options)[source]#
uproot.recreate()
with remote file support andTBasket
size control.- Parameters:
name (str, optional, default='Events') – Name of tree.
parents (bool, optional, default=True) – Create parent directories if not exist.
basket_size (int, optional) – Size of
TBasket
. If not given, a newTBasket
will be created for eachextend()
call.**options (dict, optional) – Additional options passed to
uproot.recreate()
.
- __call__(path)[source]#
Set output path.
- Parameters:
path (PathLike) – Path to output ROOT file.
- Returns:
self (TreeWriter)
- __exit__(*exc)[source]#
If no exception is raised, move the temporary file to the output path and store
Chunk
information totree
.
- extend(data)[source]#
Extend the
TTree
withdata
usinguproot.writing.writable.WritableTree.extend()
.- Parameters:
data (RecordLike) – Data to extend.
- Returns:
self (TreeWriter)
- class heptools.root.io.TreeReader(branch_filter=None, transform=None, **options)[source]#
Read data from
Chunk
.- Parameters:
branch_filter (Callable[[set[str]], set[str]], optional) – A function to select branches. If not given, all branches will be read.
transform (Callable[[RecordLike], RecordLike], optional) – A function to transform the data after reading. If not given, no transformation will be applied.
**options (dict, optional) – Additional options passed to
uproot.open()
.
- arrays(source, library='ak', **options)[source]#
Read
source
into array.- Parameters:
source (Chunk) – Chunk of
TTree
.library (Literal['ak', 'np', 'pd'], optional, default='ak') –
The library used to represent arrays.
library='ak'
: returnak.Array
.library='pd'
: returnpandas.DataFrame
.library='np'
: returndict
ofnumpy.ndarray
.
**options (dict, optional) – Additional options passed to
uproot.behaviors.TBranch.HasBranches.arrays()
.
- Returns:
RecordLike – Data from
TTree
.
- concat(*sources, library='ak', **options)[source]#
Read
sources
into one array. The branches ofsources
must be the same after filtering.Todo
Add
multiprocessing
support.
- iterate(*sources, step=..., library='ak', mode='partition', **options)[source]#
Iterate over
sources
.- Parameters:
step (int, optional) – Number of entries to read in each iteration step. If not given, the chunk size will be used and the
mode
will be ignored.library (Literal['ak', 'np', 'pd'], optional, default='ak') – The library used to represent arrays.
mode (Literal['balance', 'partition'], optional, default='partition') –
The mode to generate iteration steps.
mode='balance'
: usebalance()
. The length of output arrays is not guaranteed to bestep
but no need to concatenate.mode='partition'
: usepartition()
. The length of output arrays is guaranteed to bestep
except for the last one but need to concatenate.
**options (dict, optional) – Additional options passed to
arrays()
.
- Yields:
RecordLike – A chunk of data from
TTree
.
- dask(*sources, partition=..., library='ak')[source]#
Read
sources
into delayed arrays.- Parameters:
- Returns:
DelayedRecordLike – Delayed data from
TTree
.
dask
#
- heptools.root.merge.resize(path, *sources, step, chunk_size=..., writer_options=None, reader_options=None, clean_source=True, dask=False)[source]#
merge()
sources
intoChunk
andclean()
sources
after merging.- Parameters:
path (PathLike) – Path to output ROOT file.
step (int) – Number of entries to read and write in each iteration step.
chunk_size (int, optional) – Number of entries in each chunk. If not given, all entries will be merged into one chunk.
writer_options (dict, optional) – Additional options passed to
TreeWriter
.reader_options (dict, optional) – Additional options passed to
TreeReader
.clean_source (bool, optional, default=True) – If
True
, remove the source chunk after moving.dask (bool, optional, default=False) – If
True
, return aDelayed
object.
- Returns:
list[Chunk] or Delayed – Merged chunks.
- heptools.root.merge.merge(path, *sources, step, writer_options=None, reader_options=None, dask=False)[source]#
Merge
sources
into oneChunk
.- Parameters:
path (PathLike) – Path to output ROOT file.
step (int) – Number of entries to read and write in each iteration step.
writer_options (dict, optional) – Additional options passed to
TreeWriter
.reader_options (dict, optional) – Additional options passed to
TreeReader
.dask (bool, optional, default=False) – If
True
, return aDelayed
object.
- Returns:
Chunk or Delayed – Merged chunk.