Actually, there is a function that just does this, and it's in the public API, but not the "high level" parts intended for data analysts. It's intended for downstream libraries to build on top of Awkward Array (and Awkward Array uses it internally quite a lot).
In an array's (low level) layout
, there's a property called minmax_depth
.
>>> import awkward as ak
>>> arr = ak.Array([[1, 2, 3], [3, 2], [], [5], [6, 9, 6, 9]])
>>> arr.layout.minmax_depth
(2, 2)
Here, the minimum and maximum are both 2 because this is a relatively simple type. But a heterogeneous union can have a different minimum and maximum:
>>> arr = ak.Array([1, [2, 3, [4, 5, 6]]])
>>> arr.layout.minmax_depth
(1, 3)
and (as a more common case), records can introduce different levels of depth:
>>> arr = ak.Array([{"x": 1, "y": [{"z": [[[2]]]}]}])
>>> arr.layout.minmax_depth
(1, 5)
There are also variants on this like branch_depth
(bool for is-branching? and minimum depth) and purelist_depth
(depth of just lists and missing value nodes, not records or unions).
>>> arr = ak.Array([{"x": 1, "y": [{"z": [[[2]]]}]}])
>>> arr.layout.branch_depth
(True, 1)
>>> arr.layout.purelist_depth
1
The fact that different parts of the array can have different depths (unlike a NumPy array, in which it's always ndim
or len(shape)
) is important for interpreting the axis
parameter. Unlike NumPy, negative axis
can mean different levels of depth in different parts of the array (because it's counting up from the deepest, which can be different in different parts).
>>> arr = ak.Array([{"x": [1, 2, 3], "y": [[1, 2, 3], []]}])
>>> ak.sum(arr, axis=-1)
<Record {x: [6], y: [[6, 0]]} type='{"x": var * int64, "y": var * var * int64}'>
Above, the y field is deeper than the x
field, but axis=-1
means to sum along the deepest axis, wherever that is.