If you are happy with list
instead of tuple
, this could be achieved with the following trick:
- convert your array to
list
of list
s using .tolist()
- make sure that you change the size of one of the innermost
list
(misalign)
- convert the
list
of list
s back to NumPy array
- fix the modification of point 2.
This is implemented in the following function last_dim_as_list()
:
import numpy as np
def last_dim_as_list(arr):
if arr.ndim > 1:
# : convert to list of lists
arr_list = arr.tolist()
# : misalign size of the first innermost list
temp = arr_list
for _ in range(arr.ndim - 1):
temp = temp[0]
temp.append(None)
# : convert to NumPy array
# (uses `object` because of the misalignment)
result = np.array(arr_list)
# : revert the misalignment
temp.pop()
else:
result = np.empty(1, dtype=object)
result[0] = arr.tolist()
return result
np.random.seed(0)
in_arr = np.random.randint(0, 9, (2, 3, 2))
out_arr = last_dim_as_list(in_arr)
print(in_arr)
# [[[5 0]
# [3 3]
# [7 3]]
# [[5 2]
# [4 7]
# [6 8]]]
print(in_arr.shape)
# (2, 3, 2)
print(in_arr.dtype)
# int64
print(out_arr)
# [[list([5, 0]) list([3, 3]) list([7, 3])]
# [list([5, 2]) list([4, 7]) list([6, 8])]]
print(out_arr.shape)
# (2, 3)
print(out_arr.dtype)
# object
However, I would NOT recommend taking this route unless you really know what you are doing.
Most of the time you are better off by keeping everything as a NumPy array of higher dimensionality, and make good use of NumPy indexing.
Note that this could also be done with explicit loops, but the proposed approach should be much faster for large enough inputs:
def last_dim_as_list_loop(arr):
shape = arr.shape
result = np.empty(arr.shape[:-1], dtype=object).ravel()
for k in range(arr.shape[-1]):
for i in range(result.size):
if k == 0:
result[i] = []
result[i].append(arr[..., k].ravel()[i])
return result.reshape(shape[:-1])
out_arr2 = last_dim_as_list_loop(in_arr)
print(out_arr2)
# [[list([5, 0]) list([3, 3]) list([7, 3])]
# [list([5, 2]) list([4, 7]) list([6, 8])]]
print(out_arr2.shape)
# (2, 3)
print(out_arr2.dtype)
# object
But the timings for this last are not exactly spectacular:
%timeit last_dim_as_list(in_arr)
# 2.53 µs ± 37.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit last_dim_as_list_loop(in_arr)
# 12.2 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The view
-based approach proposed by @PaulPanzer is very elegant and more efficient than the trick proposed in last_dim_as_list()
because it loops (internally) through the array only once as compared to twice:
def last_dim_as_tuple(arr):
dtype = [(str(i), arr.dtype) for i in range(arr.shape[-1])]
return arr.view(dtype)[..., 0].astype(object)
and therefore the timings on large enough inputs are more favorable:
in_arr = np.random.random((6602, 3176, 2))
%timeit last_dim_as_list(in_arr)
# 4.9 s ± 73.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit last_dim_as_tuple(in_arr)
# 3.07 s ± 117 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)