0

I am working with large arrays representing a grid, each element is a Cell object with x,y attributes.

I am not sure the most efficient way to initialize the arrays, my basic implementation is :

# X,Y dimensions of grid:
Gx = 3000
Gy = 4000

    # Array to create
    A = numpy.ndarray(shape=(int(self.Gx),int(self.Gy)),dtype=object)

for y in range(0,int(self.Gy)):
             for x in range (0,int(self.Gx)):       
              c = Cell(1,x,y,1)
              A.itemset((x,y),c)

Clearly, this is not efficient for large arrays. I know how to create large array of objects and use vectorize to access them all at once. What I can't figure out is how to apply an array of indices (via A.indices) in a single function that doesn't require iterating over the entire array.

Each Cell objects does have a setX and setY function, can I pass functions the array of indices to set each cell's y value in a single line?

  • Please give us a minimal working example. We don't know what Gy and Gx is and why you always create the list R without using it. – Molitoris Oct 28 '18 at 17:39
  • 'Efficient' in `numpy` means doing stuff in compiled numpy code, which is built around numeric dtypes. Your array of objects is `object` dtype. `numpy` iterates over those objects much like Python does with a list of the same - but numpy's iteration is slower. We might be able to suggest improvements to a working list based example, but can't promise numpy like efficiency. – hpaulj Oct 28 '18 at 17:46
  • Related post: https://stackoverflow.com/questions/32831839/combining-features-of-array-of-objects-with-object-of-arrays; https://stackoverflow.com/questions/42067429/access-elements-from-array-of-arrays-call-function-to-execute-array-of-arrays – hpaulj Oct 28 '18 at 18:06
  • Reviewing my earlier answers, it's apparent that `np.frompyfunc` is the fastest tool for iterating over an array of objects. It can be used to create of objects, and can be used to access attributes and methods. Speed is comparable to a well written list comprehensions over the same number of objects. – hpaulj Oct 28 '18 at 18:26
  • Updated the code to a minimal working example. Can you give an example of using np.frompyfunc? – Daryl Kowalski Oct 28 '18 at 18:36
  • I don't see a minimal working example, not here, not in your next question, https://stackoverflow.com/questions/53055726/how-does-frompyfunc-iterate-over-arrays. And in answer to that question, `frompyfunc` iterates over all elements of the input arrays (broadcasting as needed), calling your function once for each 'scalar' set of values. It does not pass slices or rows of 2d arrays. – hpaulj Oct 30 '18 at 05:49

1 Answers1

0

Define a simple class:

class Cell():
    def __init__(self,x,y):
        self.x=x
        self.y=y
    def setX(self,x):
        self.x=x
    def __repr__(self):
        return f'Cell({self.x},{self.y})'

A way of creating an array of these objects:

In [653]: f = np.frompyfunc(Cell, 2, 1)
In [654]: arr = f(np.arange(3)[:,None], np.arange(4))
In [655]: arr
Out[655]: 
array([[Cell(0,0), Cell(0,1), Cell(0,2), Cell(0,3)],
       [Cell(1,0), Cell(1,1), Cell(1,2), Cell(1,3)],
       [Cell(2,0), Cell(2,1), Cell(2,2), Cell(2,3)]], dtype=object)
In [656]: arr.shape
Out[656]: (3, 4)

A list way of creating the same objects:

In [658]: [[Cell(i,j) for i in range(3)] for j in range(4)]
Out[658]: 
[[Cell(0,0), Cell(1,0), Cell(2,0)],
 [Cell(0,1), Cell(1,1), Cell(2,1)],
 [Cell(0,2), Cell(1,2), Cell(2,2)],
 [Cell(0,3), Cell(1,3), Cell(2,3)]]

Some comparative timings:

In [659]: timeit arr = f(np.arange(3)[:,None], np.arange(4))
13.5 µs ± 73.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [660]: timeit [[Cell(i,j) for i in range(3)] for j in range(4)]
8.3 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [661]: timeit arr = f(np.arange(300)[:,None], np.arange(400))
64.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [662]: timeit [[Cell(i,j) for i in range(300)] for j in range(400)]
78 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

For large sets, the frompyfunc approach has a modest speed advantage.

Fetching the values from all cells:

In [664]: np.frompyfunc(lambda c: c.x, 1, 1)(arr)
Out[664]: 
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]], dtype=object)

Using the SetX method:

In [665]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(12).reshape(3,4))
Out[665]: 
array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)
In [666]: arr
Out[666]: 
array([[Cell(0,0), Cell(1,1), Cell(2,2), Cell(3,3)],
       [Cell(4,0), Cell(5,1), Cell(6,2), Cell(7,3)],
       [Cell(8,0), Cell(9,1), Cell(10,2), Cell(11,3)]], dtype=object)

SetX doesn't return anything, so the array produced by function call is all None. But it has modified all elements of arr. Like list comprehensions, we don't normally use frompyfunc calls for side effects, but it is possible.

np.vectorize, in it's default (and original) form, just uses frompyfunc, and adjusts the dtype of the return. frompyfunc always returns object dtype. Newer versions of vectorize have a signature parameter, allowing us to pass arrays (as opposed to scalars) to the function, and get back arrays. But this processing is even slower.

Defining array of objects like this may make your code look cleaner and better organized, but they can never match numeric numpy arrays in terms of speed.


Given the definition of Cell I can set the attributes to arrays, e.g.

Cell(np.arange(3), np.zeros((3,4)))

But to set the values of an array of Cell, I have to construct an object array first:

In [676]: X = np.zeros(3, object)
In [677]: for i,row in enumerate(np.arange(6).reshape(3,2)): X[i]=row
In [678]: X
Out[678]: array([array([0, 1]), array([2, 3]), array([4, 5])], dtype=object)
In [679]: np.frompyfunc(Cell.setX, 2, 1)(arr, X[:,None])
Out[679]: 
array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)
In [680]: arr
Out[680]: 
array([[Cell([0 1],0), Cell([0 1],1), Cell([0 1],2), Cell([0 1],3)],
       [Cell([2 3],0), Cell([2 3],1), Cell([2 3],2), Cell([2 3],3)],
       [Cell([4 5],0), Cell([4 5],1), Cell([4 5],2), Cell([4 5],3)]],
      dtype=object)

I could not pass a (3,2) array:

In [681]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(6).reshape(3,2))
ValueError: operands could not be broadcast together with shapes (3,4) (3,2) 

numpy preferentially works with multidimensional (numeric) arrays. Creating and using object dtype array requires some special tricks.

hpaulj
  • 221,503
  • 14
  • 230
  • 353