2

I have a single file with a single ROOT tree and would like to read it with uproot4 in multiple processes, with each of them reading disjoint parts of the tree. With uproot3 this was possible by passing an iterable for entrysteps:

def index_generator(rank, nranks, numentries, chunk_size):
    start = int(numentries / nranks * rank)
    stop = int(numentries / nranks * (rank + 1))
    index = start
    while index < stop:
        yield (index, min(index + chunk_size, stop))
        index += chunk_size

entrysteps = index_generator(rank, nranks, numentries, chunk_size)
iterator = uproot.tree.iterate(
    filename, treename, branches=branches,
    namedecode="utf-8", entrysteps=entrysteps)

I uproot4 entrysteps seems to have been replaced by step_size, which does not accept an iterator anymore.

Is there currently a way to do this in uproot4? If not, is such an option planned?

YSelf
  • 2,646
  • 1
  • 14
  • 19
  • 1
    This question would be better on GitHub Issues because you're asking for something only the authors of the software can help you with, namely "is such an option planned?" The answer is "currently no," but you can make it a feature request on GitHub Issues. It wouldn't be hard, but I didn't realize this generality was used. In fact, in your example, you don't need it either: you want `step_size=chunk_size`. – Jim Pivarski Nov 02 '20 at 16:48
  • Thanks, I'll open an issue. `step_size=chunk_size` would give me every chunk, but I want only every `nranks`th chunk, with `rank` offset, or (this is the code above) every chunk, but not from 0 to maxentries, but from rank/nranks*maxentries to (rank+1)/nranks*maxentries. – YSelf Nov 02 '20 at 17:13
  • That describes a use-case I hadn't considered. Okay! – Jim Pivarski Nov 02 '20 at 22:58

0 Answers0