8

I would like to use itertools' various functions to create numpy arrays. I can easily compute ahead of time the number of elements in the product, combinations, permutations, etc, so allotting space shouldn't be a problemo.

e.g.

coords = [[1,2,3],[4,5,6]]
iterable = itertools.product(*coords)
shape = (len(coords[0]), len(coords[1]))
arr = np.iterable_to_array(
    iterable, 
    shape=shape, 
    dtype=np.float64, 
    count=shape[0]*shape[1]
) #not a real thing
answer = np.array([
    [1,4],[1,5],[1,6],
    [2,4],[2,5],[2,6],
    [3,4],[3,5],[3,6]])
assert np.equal(arr, answer)
smci
  • 32,567
  • 20
  • 113
  • 146
Him
  • 5,257
  • 3
  • 26
  • 83
  • 2
    So, is there are reason `arr = np.array(list(iterable))` doesn't work for you? You are probably looking for `np.formiter` but it doesn't deal with multidimensional arrays very well, last I tried. – juanpa.arrivillaga Jan 05 '17 at 18:35
  • I could also create a zero array and then fill the individual values. That would be faster, possibly: http://forthescience.org/blog/2015/06/07/performance-of-filling-a-numpy-array/... was just wondering if there was a nice way to have numpy do the work, since iterables pop up all over the place in python. – Him Jan 05 '17 at 18:40
  • 1
    unfortunately, AFAIK there is only support for building 1-dimensional arrays from iterables. Check out this exchange: https://mail.scipy.org/pipermail/numpy-discussion/2007-August/028898.html Indeed, they suggest using `empty`! – juanpa.arrivillaga Jan 05 '17 at 18:44
  • I suspected as much. Thanks! – Him Jan 05 '17 at 18:48

1 Answers1

8

Here are several numpy ways of generating an array with these values

In [469]: coords = [[1,2,3],[4,5,6]]
In [470]: it = itertools.product(*coords)
In [471]: arr = np.array(list(it))
In [472]: arr
Out[472]: 
array([[1, 4],
       [1, 5],
       [1, 6],
       [2, 4],
       [2, 5],
       [2, 6],
       [3, 4],
       [3, 5],
       [3, 6]])

fromiter will work with an appropriate structured dtype:

In [473]: it = itertools.product(*coords)
In [474]: arr = np.fromiter(it, dtype='i,i')
In [475]: arr
Out[475]: 
array([(1, 4), (1, 5), (1, 6), (2, 4), (2, 5), (2, 6), (3, 4), (3, 5),
       (3, 6)], 
      dtype=[('f0', '<i4'), ('f1', '<i4')])

But usually we use the tools that numpy provides for generating sequences and meshes. np.arange is used all over the place.

meshgrid is widely used. With a bit of trial and error I found that I could transpose its output, and produce the same sequence:

In [481]: np.transpose(np.meshgrid(coords[0], coords[1], indexing='ij'), (1,2,0)).reshape(-1,2)
Out[481]: 
array([[1, 4],
       [1, 5],
       [1, 6],
       [2, 4],
       [2, 5],
       [2, 6],
       [3, 4],
       [3, 5],
       [3, 6]])

repeat and tile also useful for tasks like this:

In [487]: np.column_stack((np.repeat(coords[0],3), np.tile(coords[1],3)))
Out[487]: 
array([[1, 4],
       [1, 5],
       [1, 6],
       [2, 4],
       [2, 5],
       [2, 6],
       [3, 4],
       [3, 5],
       [3, 6]])

I've done some timings on fromiter in the past. My memory is that it offers only a modest time savings over np.array.

A while back I explored itertools and fromiter, and found a way to combine them usingitertools.chain

convert itertools array into numpy array

In [499]: it = itertools.product(*coords)
In [500]: arr = np.fromiter(itertools.chain(*it),int).reshape(-1,2)
In [501]: arr
Out[501]: 
array([[1, 4],
       [1, 5],
       [1, 6],
       [2, 4],
       [2, 5],
       [2, 6],
       [3, 4],
       [3, 5],
       [3, 6]])
Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353