2

I'm trying to use the excellent uproot and awkward-array to read some analysis data stored in a TTree. I understand that ROOT doesn't write nested vectors (ie. std::vector<std::vector<int>>) in a columnar format, but following this discussion, I modified my tree output to contain two separate branches: one std::vector<int> with the content, and one std::vector<int> with the offsets. The contents vector has values pushed into it multiple times between filling the tree. Each time it has values pushed in, the size of the contents vector is stored in the offsets.

My idea was that I would recreate the structure that I need via a nested JaggedArray when I read the tree. However, reading through the awkward-array documentation, I can't seem to figure out the right way to construct this nested JaggedArray without looping in python. fromoffsets requires a 1D index, which means that the jagged indices must be flattened, which then loses their structure. None of the other classmethods seem to fit. The example below uses a generator, which I think will be rather slow due to looping in python. Is there a better way to construct the JaggedArray? Or a better way to store the data in the tree?

import awkward as ak
all_jagged_indices = ak.fromiter([[0, 1, 4], [0, 1, 2, 3]])
all_constituents = ak.fromiter([[12, 14, 3, 4], [2, 8, 3]])
output = ak.fromiter(
    (ak.JaggedArray.fromoffsets(jagged_indices, constituents)
     for jagged_indices, constituents in
     zip(all_jagged_indices, all_constituents))
)
expected = ak.fromiter([[[12], [14, 3, 4]], [[2], [8], [3]]])
assert (output == expected).all().all().all()

Thanks!

1 Answers1

1

You've got the right idea, but ultimately, there isn't a way to convert a jagged ObjectArray into a doubly jagged array without a "for" loop. The structure of the data requires it.

This is a key issue, though, and it's a reason why some of these algorithms are being ported into C++. The last plot in this talk directly addresses this kind of data (jagged^N of numbers) with a "for" loop moved into C++. This is in development for Awkward 1.0 and Uproot 4.0, which is scheduled to be ready for users at the end of April. (At which point, the conversion of std::vector<std::vector<numbers>> will be automatic, because there's no performance penality anymore.)

At the moment, however, a Python "for" loop, implicitly within fromiter, is the best you can do.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
  • Thanks Jim! I look forward to the arrival of uproot 4 when it's ready! When it does arrive, do you expect that there will be a performance difference between combining the two jagged branches vs storing the `std::vector>` directly? The former is definitely more error prone, so if there's no difference (sounds like there isn't one now), then I'll just use the latter. – Raymond Ehlers Feb 16 '20 at 19:36
  • The reason Uproot 3 doesn't give you a doubly jagged array already is because it's such a large performance difference that I thought it necessary to make it optional. With the performance difference gone, Uproot 4 will just return the doubly jagged array directly. There won't be anything to build. – Jim Pivarski Feb 17 '20 at 20:09
  • @JimPivarski has the functionality for returning doubly jagged arrays been implemented lately? The `fromiter` method to convert the `ObjectArray` is very slow for exploratory uses. – Chami Sangeeth Amarasinghe May 17 '20 at 19:32
  • It has beein implemented in Awkward1: https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_iter.html but Awkward1 still needs to be integrated into Uproot. (That's what I'm working on right now, in fact.) For the moment, you have to move arrays from Uproot to Awkward1 manually. Note that there's a https://awkward-array.readthedocs.io/en/latest/_auto/ak.from_awkward0.html that I think handles both ObjectArrays and non-ObjectArrays optimally. – Jim Pivarski May 17 '20 at 22:55