70

I want to create a numpy array in which each element must be a list, so later I can append new elements to each.

I have looked on google and here on stack overflow already, yet it seems nowhere to be found.

Main issue is that numpy assumes your list must become an array, but that is not what I am looking for.

Ricardo Silveira
  • 1,193
  • 1
  • 8
  • 16
  • 1
    Why not create a 2D-array? – Nayeem Zen Nov 29 '15 at 13:04
  • 2
    why `numpy` array of lists? why not list of `numpy` arrays? or list of lists? – Shai Nov 29 '15 at 13:04
  • 2
    I have special requirements. – Ricardo Silveira Nov 29 '15 at 13:41
  • Please explain your "special requirements" in more detail. If your primary concern is with the speed of append operations then you can't do much better than a regular Python list-of-lists, since appending to a list is very cheap compared with array concatenation. However, this comes at a big storage and performance cost assuming that you want to perform numerical operations on your list-of-lists. Is every sub-list going to have the same length, or are you trying to represent a 'ragged' array with different row lengths? – ali_m Nov 29 '15 at 15:06
  • There are lots of SO questions about creating arrays of `dtype=object`. – hpaulj Nov 29 '15 at 16:03
  • 4
    @ali_m, adjacency list for sparse graphs. I need the operations to be O(m+n). lists of lists kill that. Also each list may have different sizes. I need the first vector to be an array of O(1) for accessing each element, and then a list to access it at O(d_max) for each element of the given list. – Ricardo Silveira Nov 29 '15 at 22:10
  • Please add the full details to your question. I'm now even more convinced that you want a list-of-lists. [Appending and indexing are both O(1) for Python lists](https://wiki.python.org/moin/TimeComplexity). Concatenating numpy arrays is O(n). – ali_m Nov 29 '15 at 22:32
  • In fact, [here's an example](https://www.python.org/doc/essays/graphs/) from [the BDFL himself](https://en.wikipedia.org/wiki/Guido_van_Rossum), suggesting a graph implementation using Python lists to store adjacency information. – ali_m Nov 29 '15 at 23:55
  • Appending and indexing you are right, searching is not. – Ricardo Silveira Dec 02 '15 at 02:28
  • Also in terms of efficienty, numpy arrays are more memory efficient than python lists. This is why I wanted a numpy array and then use lists only for what it is really needed. – Ricardo Silveira Dec 02 '15 at 02:31

10 Answers10

69

As you discovered, np.array tries to create a 2d array when given something like

 A = np.array([[1,2],[3,4]],dtype=object)

You have apply some tricks to get around this default behavior.

One is to make the sublists variable in length. It can't make a 2d array from these, so it resorts to the object array:

In [43]: A=np.array([[1,2],[],[1,2,3,4]])
In [44]: A
Out[44]: array([[1, 2], [], [1, 2, 3, 4]], dtype=object)

And you can then append values to each of those lists:

In [45]: for i in A: i.append(34)
In [46]: A
Out[46]: array([[1, 2, 34], [34], [1, 2, 3, 4, 34]], dtype=object)

np.empty also creates an object array:

In [47]: A=np.empty((3,),dtype=object)
In [48]: A
Out[48]: array([None, None, None], dtype=object)

But you then have to be careful how you change the elements to lists. np.fill is tempting, but has problems:

In [49]: A.fill([])
In [50]: A
Out[50]: array([[], [], []], dtype=object)
In [51]: for i in A: i.append(34)
In [52]: A
Out[52]: array([[34, 34, 34], [34, 34, 34], [34, 34, 34]], dtype=object)

It turns out that fill puts the same list in all slots, so modifying one modifies all the others. You can get the same problem with a list of lists:

In [53]: B=[[]]*3
In [54]: B
Out[54]: [[], [], []]
In [55]: for i in B: i.append(34)
In [56]: B
Out[56]: [[34, 34, 34], [34, 34, 34], [34, 34, 34]]

The proper way to initial the empty A is with an iteration, e.g.

In [65]: A=np.empty((3,),dtype=object)
In [66]: for i,v in enumerate(A): A[i]=[v,i]
In [67]: A
Out[67]: array([[None, 0], [None, 1], [None, 2]], dtype=object)
In [68]: for v in A: v.append(34)
In [69]: A
Out[69]: array([[None, 0, 34], [None, 1, 34], [None, 2, 34]], dtype=object)

It's a little unclear from the question and comments whether you want to append to the lists, or append lists to the array. I've just demonstrated appending to the lists.

There is an np.append function, which new users often misuse. It isn't a substitute for list append. It is a front end to np.concatenate. It is not an in-place operation; it returns a new array.

Also defining a list to add with it can be tricky:

In [72]: np.append(A,[[1,23]])
Out[72]: array([[None, 0, 34], [None, 1, 34], [None, 2, 34], 1, 23],     dtype=object)

You need to construct another object array to concatenate to the original, e.g.

In [76]: np.append(A,np.empty((1,),dtype=object))
Out[76]: array([[None, 0, 34], [None, 1, 34], [None, 2, 34], None], dtype=object)

In all of this, an array of lists is harder to construct than a list of lists, and no easier, or faster, to manipulate. You have to make it a 2d array of lists to derive some benefit.

In [78]: A[:,None]
Out[78]: 
array([[[None, 0, 34]],
       [[None, 1, 34]],
       [[None, 2, 34]]], dtype=object)

You can reshape, transpose, etc an object array, where as creating and manipulating a list of lists of lists gets more complicated.

In [79]: A[:,None].tolist()
Out[79]: [[[None, 0, 34]], [[None, 1, 34]], [[None, 2, 34]]]

===

As shown in https://stackoverflow.com/a/57364472/901925, np.frompyfunc is a good tool for creating an array of objects.

np.frompyfunc(list, 0, 1)(np.empty((3,2), dtype=object))  
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • After massaging my data, how can I convert this array of arrays back to a regular numpy array? (assuming the inner arrays are now all the same shape) – evn Nov 29 '20 at 05:34
  • @evn, some version of `concatenate` can rejoin arrays into one. `np.stack()` is often a good choice. Make sure the outer array is 1d, because they treat it like a list of arrays. – hpaulj Nov 29 '20 at 07:27
10

If you really need a 1-d array of lists you will have to wrap your lists in your own class as numpy will always try to convert your lists to arrays inside of an array (which is more efficient but obviously requires constant size-elements), for example through

class mylist:

    def __init__(self, l):
        self.l=l

    def __repr__(self): 
        return repr(self.l)

    def append(self, x):
        self.l.append(x)

and then you can change any element without changing the dimension of others

>>> x = mylist([1,2,3])
>>> y = mylist([1,2,3])
>>> import numpy as np
>>> data = np.array([x,y])
>>> data
array([[1,2,3], [1,2,3]], dtype=object)
>>> data[0].append(2)
>>> data
array([[1,2,3,2], [1,2,3]], dtype=object)

Update

As suggested by ali_m there is actually a way to force numpy to simply create a 1-d array for references and then feed them with actual lists

>>> data = np.empty(2, dtype=np.object)
>>> data[:] = [1, 2, 3], [1, 2, 3]
>>> data
array([[1, 2, 3], [1, 2, 3]], dtype=object)
>>> data[0].append(4)
>>> data
array([[1, 2, 3, 4], [1, 2, 3]], dtype=object)
lejlot
  • 64,777
  • 8
  • 131
  • 164
  • In my case I don't know how many mylists I will have, how would it be in a generic way? – Ricardo Silveira Nov 29 '15 at 14:23
  • 4
    You could achieve the same result without defining a new class using `data = np.empty(2, dtype=np.object); data[:] = [1, 2, 3], [1, 2, 3]` – ali_m Nov 29 '15 at 15:18
  • @Ricardo if you do not know how many lists you have then why would you use arrays in the first place? They are **constant size**. If you want to add lists on the go, then probably a list of lists is preferable. Otherwise - you can always use `np.concatenate` to merge arrays – lejlot Nov 29 '15 at 15:38
  • An interesting observation @lejlot and a cautionary note to others looking for shortcuts... don't forget the data = np.empty(2, dtype=np.object) creation line. If you try to apparently skip a step by using ... data = np.array([[1,2,3],[4,5,6]],dtype=np.object) ... then try to ... data[0].append(4) it will return ... AttributeError: 'numpy.ndarray' object has no attribute 'append' in python 3.4.x . apparently an array of empty objects needs to be created first, followed by a fill. –  Nov 29 '15 at 15:51
  • `np.append` is just an alternative way of calling `np.concatenate`. It is not a clone of list append. – hpaulj Nov 29 '15 at 16:00
  • Well, I need to start a vector/array with N elements, each element will have a list of elements and they may vary size, and these lists I will append to. – Ricardo Silveira Nov 29 '15 at 22:14
5
data = np.empty(20, dtype=np.object)
for i in range(data.shape[0]):
    data[i] = []
    data[i].append(i)
print(data)

The result will be:

[list([0]) list([1]) list([2]) list([3]) list([4]) list([5]) list([6]) list([7]) list([8]) list([9]) list([10]) list([11]) list([12]) list([13]) list([14]) list([15]) list([16]) list([17]) list([18]) list([19])]
desertnaut
  • 57,590
  • 26
  • 140
  • 166
Dmitriy
  • 51
  • 1
  • 1
2

A simple way would be:

A = [[1,2],[3,4]] 
B = np.array(A+[[]])[:-1]
Nadav
  • 157
  • 1
  • 1
  • 7
2

Just found this, I've never answered a question before, but here is a pretty simple solution:

If you want a vector of length n, use:

A = np.array([[]]*n + [[1]])[:-1]

This returns:

array([list([]), list([]), ... , list([])], dtype=object)

If instead you want an n by m array, use:

A = np.array([[]]*n*m + [[1]])[:-1]
B = A.reshape((n,m))

For higher rank arrays, you can use a similar method by creating a long vector and reshaping it. This may not be the most efficient way, but it worked for me.

Dharman
  • 30,962
  • 25
  • 85
  • 135
0

Lists aren't very numpy anyway, so maybe a tuple of lists is good enough for you. You can get that easily and rather efficiently with an iterator expression:

fiveLists = tuple([] for _ in range(5))

You can leave out the tuple if you only need it once (gives you the raw iterator).

You can use this to create a numpy array if you really want to:

arrayOfLists = np.fromiter(([] for _ in range(5)), object)

Edit: as of July 2020, you get "ValueError: cannot create object arrays from iterator"

Chris K
  • 1,376
  • 1
  • 14
  • 18
0

if you need to create an array of the array from a sequence of lists or tuples

x=[[1,2],[3,4],[5,6]]
print(type(x))
print(type(x[0]))
#<class 'list'>
#<class 'list'>
import numpy as np
ar=np.array([np.array(i) for i in x],dtype=object)
print(type(ar))
print(type(ar[0]))
#<class 'numpy.ndarray'>
#<class 'numpy.ndarray'>
code-freeze
  • 465
  • 8
  • 8
0

I realize this is a bit of a workaround if you don't need Pandas but it achieves the stated objective:

import pandas as pd

A = pd.Series([[1, 2], [3, 4]]).to_numpy()
assert isinstance(A[0], list)
Bill
  • 10,323
  • 10
  • 62
  • 85
0

Numpy array() does support a ndmin argument which allows you to set the minumim number of dimensions in the output array, but unfortunately does not (yet) support a ndmax argument which would allow this to happen easily.

In the meantime, here is a small function that will create a 1D array from an arbitrarily nested sequence:

def create_1d_array(seq: Sequence) -> np.ndarray:
    arr = np.empty(len(seq), dtype=object)
    arr[:] = [s for s in seq]
    return arr
>>> create_1d_array([[1, 2], [3, 4]])
array([list([1, 2]), list([3, 4])], dtype=object)
Lee Netherton
  • 21,347
  • 12
  • 68
  • 102
0

I had the same problem, elements of lists were added to the array as separate elements, not as lists. With help of @hpaulj I solved this problem as simple as:

array_of_lists = np.array(np.empty(1, dtype=object))
array_of_lists[0] = first_list
if second_list:
     array_of_lists = np.append(array_of_lists, np.empty(1, dtype=object))
     array_of_lists[1] = second_list
if third_list:
     array_of_lists = np.append(array_of_lists, np.empty(1, dtype=object))
     array_of_lists[2] = third_list

Hope this can help someone.