Python toolkit for manipulating nested data structures as though they were NumPy arrays.
awkward is a Python library for computing array-at-a-time ("vectorized") operations on nested and irregular-length data structures. The interface resembles numpy as much as possible and is implemented using NumPy.
It is intended as an interactive analysis toolkit for datasets that can't be reduced to rectilinear arrays. For example, a dataset of extrasolar planets can have arbitrarily many planets per star, and each planet has several attributes. A dataset of particle physics collisions contains collision event records, each with arbitrarily many electron records, muon records, photon records, etc. Instead of looping over these constructs in a general purpose language, awkward-array allows the user to slice them like (irregular) multidimensional arrays, project through columns, sum over variable-sized sets, etc.
As an artificial example, consider this structure:
complicated = awkward.fromiter(
[[1.21, 4.84, None, 10.89, None],
[19.36, [30.25]],
[{"x": 36, "y": {"z": 49}}, None, {"x": 64, "y": {"z": 81}}]
])
Once in awkward form, we can apply Numpy operations, such as ufuncs:
numpy.sqrt(complicated).tolist()
# [[1.1, 2.2, None, 3.3000000000000003, None],
# [4.4, [5.5]],
# [{'x': 6.0, 'y': {'z': 7.0}}, None, {'x': 8.0, 'y': {'z': 9.0}}]]
awkward-array interfaces with apache-arrow, h5py, pandas, numba, and uproot.