2

Say we have a jagged array that looks like this:

arr = ak.Array([[1, 2, 3], [3, 2], [], [5], [6, 9, 6, 9]])

We can see that it has a depth of 2. Is there something like a single or combination of built-in functions that would tell me that?

ak.size requires that I already know the depth and ak.to_numpy -> np.size would give me an error for incompatibility. I'm looking for something inbuilt because I need it to be fast.

Thanks!

Edit: I forgot to mention that I would like to solve this in the case where the given array is guaranteed to have a uniform depth and consists entirely of numbers.

aaportel
  • 471
  • 1
  • 3
  • 6
  • 1
    I'm assuming such a function wouldn't exist since the definition of depth may be complicated. For example - what if of of the levels has a dictionary? Do you dive into the keys / values? what about other more complex objects? This sounds to me like a specific nead with specific types of values, am I correct? I can draft a quick function for that if it will help you, which will be relevant for your case – Guy Aug 12 '22 at 19:55

3 Answers3

3

Actually, there is a function that just does this, and it's in the public API, but not the "high level" parts intended for data analysts. It's intended for downstream libraries to build on top of Awkward Array (and Awkward Array uses it internally quite a lot).

In an array's (low level) layout, there's a property called minmax_depth.

>>> import awkward as ak
>>> arr = ak.Array([[1, 2, 3], [3, 2], [], [5], [6, 9, 6, 9]])
>>> arr.layout.minmax_depth
(2, 2)

Here, the minimum and maximum are both 2 because this is a relatively simple type. But a heterogeneous union can have a different minimum and maximum:

>>> arr = ak.Array([1, [2, 3, [4, 5, 6]]])
>>> arr.layout.minmax_depth
(1, 3)

and (as a more common case), records can introduce different levels of depth:

>>> arr = ak.Array([{"x": 1, "y": [{"z": [[[2]]]}]}])
>>> arr.layout.minmax_depth
(1, 5)

There are also variants on this like branch_depth (bool for is-branching? and minimum depth) and purelist_depth (depth of just lists and missing value nodes, not records or unions).

>>> arr = ak.Array([{"x": 1, "y": [{"z": [[[2]]]}]}])
>>> arr.layout.branch_depth
(True, 1)
>>> arr.layout.purelist_depth
1

The fact that different parts of the array can have different depths (unlike a NumPy array, in which it's always ndim or len(shape)) is important for interpreting the axis parameter. Unlike NumPy, negative axis can mean different levels of depth in different parts of the array (because it's counting up from the deepest, which can be different in different parts).

>>> arr = ak.Array([{"x": [1, 2, 3], "y": [[1, 2, 3], []]}])
>>> ak.sum(arr, axis=-1)
<Record {x: [6], y: [[6, 0]]} type='{"x": var * int64, "y": var * var * int64}'>

Above, the y field is deeper than the x field, but axis=-1 means to sum along the deepest axis, wherever that is.

Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
2

Simple code which does this -

import awkward as ak

TYPES_TO_ITERATE = (ak.Array, list)
def depth(arr):
    return max([depth(i) for i in arr if isinstance(i, TYPES_TO_ITERATE)], default=0) + 1

Example-

>>> parr = ak.Array([[1, 2, 3], [3, 2], [], [5], [6, 9, 6, 9]])
>>> print(depth(arr))
2

Hope this helps :)

Guy
  • 335
  • 1
  • 12
  • That's unfortunately quite slow. But it does give me an idea on how to do it without having to iterate over `arr`! I'll post if it ends up working out. Thanks! – aaportel Aug 12 '22 at 20:33
0

It turns out this was actually incredibly simple given my constraints.

TYPES_TO_ITERATE = (ak.Array, list)
def depth(arr):
    if (len(arr) == 0) or (not isinstance(arr[0], TYPES_TO_ITERATE)): 
        return 1
    else:
        return 1 + depth(arr[0])

arr = ak.Array([[[]], [[3], [2]], [[[20]]], [[5]], [[6, 9], [6, 9]]])
depth(arr)

3

Basic data-structures algo. Since there's uniform depth, I don't need to search every branch for the deepest depth. Just the first branches.

aaportel
  • 471
  • 1
  • 3
  • 6