1

I need a way to make a 2D array of tuples where each tuple is a pair of the indices at that position. I need this without for loops since i'm working with big matrices.

For example, the 3x3 case would be:

array([[(0, 0), (0, 1), (0, 2)],
       [(1, 0), (1, 1), (1, 2)],
       [(2, 0), (2, 1), (2, 2)]], dtype=object)

I know there is numpy.indices and there are pieces of advice online (there is a post asking about this here), but what they suggests basically gives a 3D array. I need a 2D one so I can pass it to a vectorized function (this one here). I need the function to work with the pair of indices and if I pass it the 3D version mentioned above, each individual index value gets passed to the function, instead of the pair.

But this doesn't happen if my indices come in a pair as a tuple. Tried it with small arrays and it works. Problem is, I can't figure out a way of getting this 2D array of tuples, aside from iterating with for loops. Tried it and it takes too long. But i'm new to programming, so maybe someone knows another way?

Radusaurus
  • 83
  • 1
  • 6
  • I don't think it is a good idea to use [`np.vectorize`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html): "The vectorize function is provided primarily for convenience, not for performance." Why not re-write your function just using numpy? – Mr. T Mar 07 '18 at 18:20
  • `vectorize` passes scalars to the function. With this `object` dtype the 'scalars' are the tuples, as opposed to the integers in the 3d version. `vectorize` has a `signature` that might be usable with the 3d input. But beware that `vectorize`, in both forms does not speed up your code (relative to explicit Python loops). – hpaulj Mar 07 '18 at 18:33
  • Thanks, i was not aware of it. I thought vectorization was generally faster than looping. Does this hold true for parallel computing as well? On the long run, I want to adapt my code to run on more than one core and you need vectorized functions for that, don't you? – Radusaurus Mar 08 '18 at 12:59

2 Answers2

3

Here's a list of tuples:

In [137]: idx=np.ndindex(3,3)
In [138]: list(idx)
Out[138]: [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

I mentioned the vectorize signature parameter. That version uses ndindex like this to iterate on the inputs.

Trying make an array from this list results in a 18 element array, which can be reshaped to (3,3,2). But with a trick we recently discussed (about making object arrays) I can make a 3x3 array of tuples:

In [144]: res = np.empty((3,3),object)
In [145]: for idx in np.ndindex(3,3):
     ...:     res[idx] = idx
     ...:     
In [146]: res
Out[146]: 
array([[(0, 0), (0, 1), (0, 2)],
       [(1, 0), (1, 1), (1, 2)],
       [(2, 0), (2, 1), (2, 2)]], dtype=object)

Making an object dtype array from lists of equal size sublists is a bit tricky. np.array tries, where possible, to make a multidimensional array of basic numeric dtype.

And for what it's worth, it's faster to iterate on a list than on an array. An object dtype array iterates faster than a numeric one, since it already contains object pointers like a list.


def foo(ij):
    print(ij)
    return 4*ij[0]+ij[1]

With object dtype res:

In [157]: f1 = np.vectorize(foo)
In [158]: f1(res)
(0, 0)
(0, 0)
(0, 1)
(0, 2)
(1, 0)
(1, 1)
(1, 2)
(2, 0)
(2, 1)
(2, 2)
Out[158]: 
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

With signature, and 3d array I get the same thing:

In [159]: f=np.vectorize(foo, signature='(n)->()')
In [160]: 
In [160]: idx=np.ndindex(3,3)
In [161]: arr = np.array(list(idx)).reshape(3,3,2)
In [162]: f(arr)
[0 0]
[0 1]
[0 2]
[1 0]
[1 1]
[1 2]
[2 0]
[2 1]
[2 2]
Out[162]: 
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

But the best way of getting this array is with whole-array operations:

In [164]: 4*arr[:,:,0]+arr[:,:,1]
Out[164]: 
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

This is hard to answer without details on how large of an array you need. I suspect nditer would be sufficiently fast to do this. The answer here references why you might be interested in doing this in c instead.

If something like

import numpy as np
myarray = np.array([[(i, j) for i in range(1000)]
                         for j in range(1000)])

is too slow to even run it's hard to imagine there's a reasonable python solution here

Seth Rothschild
  • 384
  • 1
  • 14
  • nditer doesn't improve speed. – hpaulj Mar 07 '18 at 20:18
  • It's at least 5000 by 5000, which takes >10s to run. This would not necessarily be a problem if it were not already in a for loop with hundreds of steps. It adds up. I assumed it's because of the nested for loops and was looking for a way to avoid that. – Radusaurus Mar 08 '18 at 12:52