0

I have a 2D python list, with varying lengths. I would like to convert this list to a numpy array, prepending or appending a value (e.g. 0) to lists shorter than the longest list.

Creating a ragged array doesn't really work, since I need the shape (and other functions) of the array. Additionally I think ragged arrays would be slower, though I have no evidence for this.

The following code does what I want to do, but I was hoping for a more efficient solution:

import numpy as np
python_list = [[1,2], [1,2,3], [1,2,3,4,5,6,7,8,9]]
N = len(python_list)
M = max(map(len, python_list))
value = 0
arr = np.full((N, M), value)
for i, l in zip(range(N), python_list):
    arr[i,:len(l)] = l  # append value
    arr[i,-len(l):] = l  # prepend value

References:

xod
  • 3
  • 2

1 Answers1

0

There are many SO questions like this, but it's quicker to suggest this (than to look those up).

Since you are starting with lists, might as well use a list method:

In [13]: from itertools import zip_longest
In [14]: python_list = [[1,2], [1,2,3], [1,2,3,4,5,6,7,8,9]]
In [15]: list(zip_longest(*python_list,fillvalue=0))
Out[15]: 
[(1, 1, 1),
 (2, 2, 2),
 (0, 3, 3),
 (0, 0, 4),
 (0, 0, 5),
 (0, 0, 6),
 (0, 0, 7),
 (0, 0, 8),
 (0, 0, 9)]
In [16]: np.transpose(list(zip_longest(*python_list,fillvalue=0)))
Out[16]: 
array([[1, 2, 0, 0, 0, 0, 0, 0, 0],
       [1, 2, 3, 0, 0, 0, 0, 0, 0],
       [1, 2, 3, 4, 5, 6, 7, 8, 9]])

While there are other ways, they all require iterating on the lists to find the max length.

There's a clever method involving an 2d index mask that's often included in this kind of SO question. It can be fast, but it always takes me a while to reproduce:

In [33]: lens = np.array([len(x) for x in python_list])
In [34]: idx = np.arange(mlen)<lens[:,None]
In [35]: idx
Out[35]: 
array([[ True,  True, False, False, False, False, False, False, False],
       [ True,  True,  True, False, False, False, False, False, False],
       [ True,  True,  True,  True,  True,  True,  True,  True,  True]])
In [36]: res = np.zeros((len(python_list), max(lens)),int)
In [37]: res[idx]=np.concatenate(python_list)
In [38]: res
Out[38]: 
array([[1, 2, 0, 0, 0, 0, 0, 0, 0],
       [1, 2, 3, 0, 0, 0, 0, 0, 0],
       [1, 2, 3, 4, 5, 6, 7, 8, 9]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • This worked great. Funny how using "ragged" instead of "jagged" array gave no relevant results. Thanks for the elegant iterating solution :) – xod Oct 19 '21 at 16:50