3

In Numpy, I want to create an array of integer arrays (or lists). Each individual array is a set of indices. These individual arrays generally have different lengths, but sometimes all have the same length.

When the lengths are different, I can create the array as

test = np.array([[1,2],[1,2,3]],dtype=object)

When I do this, test[0] is a list of integers and I can use other_array[test[0]] without issue.

However, when test happens to have entries all the same size and I do

test = np.array([[1,2],[1,3]], dtype=object)

then test[0] is a Numpy array of dtype object. When I use other_array[test[0]] I get an error that arrays used as indices must be of integer (or boolean) type.

Here is a complete example:

other_array = np.array([0,1,2,3])
test1 = np.array([[1,2],[1,2,3]], dtype=object)
print(other_array[test1[0]]) #this works

test2 = np.array([[1,2],[1,3]], dtype=object)
print(other_array[test2[0]]) #this fails

The only way I have found around this issue is to check if test will be ragged or not before creating it and use dtype=int when it happens to have arrays of all the same size. This seems inefficient. Is there a generic way to create an array of integer arrays that is sometimes ragged and sometimes not without checking for raggedness?

  • numpy is intended for compact and fast operations on (sometimes multi-dimensional) arrays of values. Is there a reason you can't just use ordinary Python lists? numpy really isn't saving you any space or time when you're storing arbitrary-sized containers like this. – Silvio Mayolo Aug 17 '21 at 21:38
  • 1
    @SilvioMayolo In other parts of the code, I do have reasons to be using a Numpy array, as opposed to an ordinary Python list. There are times I need to be able to call `test[[1,4,5]]` efficiently as well, which you can't do with a list, as far as I know. – Combinatorialist Aug 17 '21 at 21:45
  • @SilvioMayolo, for this kind of indexing `other_array` has to be a `numpy` array. `test1` and `test2` can be lists of lists. Object dtype arrays are a lot like lists, storing references. While sometimes convenient, they rarely are better. – hpaulj Aug 17 '21 at 23:51
  • Lists are perfect for storing ordered, indexable collections of objects of different size/type (e.g. NumPy arrays). I can't think of any cases where a numpy array of lists would be better. Except maybe high dimensional (nd>2) arrays of objects. – Bill Aug 18 '21 at 00:52
  • I think maybe a numpy object array of unequal-sized arrays is useful when you want to serialize objects as `.npy`, which is somewhat more compact than pickle? – MRule May 20 '22 at 12:08

2 Answers2

3

To consistently make an object dtype array, you need to initialize one of the right size, and then assign the list to it:

In [86]: res = np.empty(2, object)
In [87]: res
Out[87]: array([None, None], dtype=object)
In [88]: res[:] = [[1,2],[1,2,3]]
In [89]: res
Out[89]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [90]: res[:] = [[1,2],[1,3]]
In [91]: res
Out[91]: array([list([1, 2]), list([1, 3])], dtype=object)

You can't assign a (2,n) array this way:

In [92]: res[:] = np.array([[1,2],[1,3]])
Traceback (most recent call last):
  File "<ipython-input-92-f05200126d48>", line 1, in <module>
    res[:] = np.array([[1,2],[1,3]])
ValueError: could not broadcast input array from shape (2,2) into shape (2,)

but a list of arrays works:

In [93]: res[:] = [np.array([1,2]),np.array([1,3])]
In [94]: res
Out[94]: array([array([1, 2]), array([1, 3])], dtype=object)
In [95]: res[:] = list(np.array([[1,2],[1,3]]))
In [96]: res
Out[96]: array([array([1, 2]), array([1, 3])], dtype=object)

The basic point is that multidimensional numeric dtype arrays are the preferred kind, while object dtype is a fall-back option, especially when using np.array(). And with some combinations of array shapes, np.array will raise an error rather than create the object dtype. So the create-and-fill is the only consistent action.

your test1, test2

Out[97]: array([list([1, 2]), list([1, 2, 3])], dtype=object)
In [98]: np.array([[1,2],[1,2,3]], dtype=object)[0]
Out[98]: [1, 2]
In [99]: np.array([[1,2],[1,3]], dtype=object)
Out[99]: 
array([[1, 2],
       [1, 3]], dtype=object)
In [100]: np.array([[1,2],[1,3]], dtype=object)[0]
Out[100]: array([1, 2], dtype=object)
In [103]: np.array([[1,2],[1,3]])[0]
Out[103]: array([1, 2])

But I wonder if there's any need to make an array from list of lists. If you are just using them as indices, indexing the list is just as good:

In [105]: [[1,2],[1,3]][0]
Out[105]: [1, 2]
In [106]: [[1,2],[1,2,3]][0]
Out[106]: [1, 2]

Note that np.nonzero (aka np.where) returns a tuple of arrays. This can be used directly as a multidimensional index. np.argwhere applies transpose to that tuple, creating an (n,ndim) array. That looks nice, but can't be used for indexing (directly).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
-1

You maybe have a good reason to use numpy on this, idk. To make thing work where it fails you could unpack it first. This works for both ragged and even. You don’t need to use any checkers as well.

test2 = np.array([[1,2],*[1,3]], dtype=object)
print(other_array[test2[0]]) #this works
The.B
  • 361
  • 2
  • 11
  • I don't see how a `array([list([1, 2]), 1, 3], dtype=object)` helps. Sure `test2[0]` will be a list, but `test2[1]` will be an int, not a list. – hpaulj Aug 17 '21 at 23:18
  • Sorry, I missed that he wanted list outputs only. – The.B Aug 18 '21 at 08:33