I have a list of 2d arrays of different shapes: lst
(see the example below).
I would like to concatenate them into a 3d array of shape (len(lst), maxx, maxy)
, where maxx
is the maximum .shape[0]
of all arrays, and maxy
is the maximum .shape[1]
.
If an array's shape is smaller than (maxx, maxy)
, then this array should start at the top left corner, and all the missing values should be filled with some value of choice (e.g. 0 or np.nan
).
Example:
lst = [np.array([[1, 2],
[3, 4]]),
np.array([[1, 2, 3],
[4, 5, 6]]),
np.array([[1, 2],
[3, 4],
[5, 6]])]
# maxx == 3
# maxy == 3
result = np.array([[[1, 2, 0],
[3, 4, 0],
[0, 0, 0]],
[[1, 2, 3],
[4, 5, 6],
[0, 0, 0]],
[[1, 2, 0],
[3, 4, 0],
[5, 6, 0]]])
Notes:
np.concatenate
requires the shapes of all arrays to match.
This question is similar - but it is only for 1d arrays.
A sub-problem:
As a special case, you may assume that .shape[1] == maxy
is the same for all arrays. For example:
lst = [np.array([[1, 2, 3],
[4, 5, 6]]),
np.array([[1, 2, 3]]),
np.array([[1, 2, 3],
[4, 5, 6]
[7, 8, 9]])]
Bonus (a hard question):
Can this be applied to more dimensions? E.g., while concatenating 3d arrays into a 4d array, all 3d arrays (rectangular parallelepipeds) will start at the same corner, and if their shapes are too small - the missing values (until the edges) will be filled with 0 or np.nan
.
How to do this at all? How to do this efficiently (potentially for thousands of arrays, each with thousands of elements)?
Maybe creating an array of the final shape and filling it somehow in a vectored way?
Or converting all arrays into dataframes and concatenating them with
pd.concat
?Maybe SciPy has some helpful functions for this?