8

I'm creating this array:

A=itertools.combinations(range(6),2)

and I have to manipulate this array with numpy, like:

A.reshape(..

If the dimensions is A is high, the command list(A) is too slow.

How can I "convert" an itertools array into a numpy array?

Update 1: I've tried the solution of hpaulj, in this specific situation is a little bit slower, any idea?

start=time.clock()

A=it.combinations(range(495),3)
A=np.array(list(A))
print A

stop=time.clock()
print stop-start
start=time.clock()

A=np.fromiter(it.chain(*it.combinations(range(495),3)),dtype=int).reshape (-1,3)
print A

stop=time.clock()
print stop-start

Results:

[[  0   1   2]
 [  0   1   3]
 [  0   1   4]
 ..., 
 [491 492 494]
 [491 493 494]
 [492 493 494]]
10.323822
[[  0   1   2]
 [  0   1   3]
 [  0   1   4]
 ..., 
 [491 492 494]
 [491 493 494]
 [492 493 494]]
12.289898
stef_B.
  • 99
  • 1
  • 4
  • Hello, where is your question? – Kotshi Oct 22 '15 at 13:40
  • How can i "convert" an itertools array into a numpy array? – stef_B. Oct 22 '15 at 13:41
  • 4
    http://stackoverflow.com/questions/367565/how-do-i-build-a-numpy-array-from-a-generator – postelrich Oct 22 '15 at 13:48
  • Are you sure it's not "too slow" because the number of combinations is excessively large? If you're trying to create a billion elements or something, that's always going to take a while. The `itertools.combinations` call returns immediately because it doesn't actually create any of the combinations up front, it's a generator. – Blckknght Oct 22 '15 at 16:56

2 Answers2

6

I'm reopening this because I dislike the linked answer. The accepted answer suggests using

np.array(list(A))  # producing a (15,2) array

But the OP aparently has already tried list(A), and found it to be slow.

Another answer suggests using np.fromiter. But buried in its comments is the note that fromiter requires a 1d array.

In [102]: A=itertools.combinations(range(6),2)
In [103]: np.fromiter(A,dtype=int)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-103-29db40e69c08> in <module>()
----> 1 np.fromiter(A,dtype=int)

ValueError: setting an array element with a sequence.

So using fromiter with this itertools requires somehow flattening the iterator.

A quick set of timings suggests that list isn't the slow step. It's converting the list to an array that is slow:

In [104]: timeit itertools.combinations(range(6),2)
1000000 loops, best of 3: 1.1 µs per loop
In [105]: timeit list(itertools.combinations(range(6),2))
100000 loops, best of 3: 3.1 µs per loop
In [106]: timeit np.array(list(itertools.combinations(range(6),2)))
100000 loops, best of 3: 14.7 µs per loop

I think the fastest way to use fromiter is to flatten the combinations with an idiomatic use of itertools.chain:

In [112]: timeit
np.fromiter(itertools.chain(*itertools.combinations(range(6),2)),dtype=int)
   .reshape(-1,2)
100000 loops, best of 3: 12.1 µs per loop

Not much of a time savings, at least on this small size. (fromiter also takes a count, which shaves off another µs. With a larger case, range(60), the fromiter takes half the time of array.


A quick search on [numpy] itertools turns up a number of suggestions of pure numpy ways of generating all combinations. itertools is fast, for generating pure Python structures, but converting those to arrays is a slow step.


A picky point about the question.

A is a generator, not an array. list(A) does produce a nested list, that can be described loosely as an array. But it isn't a np.array, and does not have a reshape method.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • You can squeeze a bit more performance out of `np.fromiter` by specifying the size of the final array , which can be computed using `scipy.special.binom(6, 2)` – ali_m Oct 22 '15 at 17:04
  • @hpaulj I've tried your solution, please see update in the question – stef_B. Oct 23 '15 at 07:21
  • There are pure numpy ways of generating combinations that may be faster. @all_m suggests one using `triu`. I believe others have been proposed in previous SO questions. – hpaulj Oct 23 '15 at 17:18
  • `A quick search on [numpy] itertools turns up a number of suggestions of pure numpy ways of generating all combinations.` @hpaulj Would you mind linking some of those, cause I couldn't find any? – winkmal Oct 01 '20 at 15:02
  • https://stackoverflow.com/search?q=%5Bnumpy%5D+itertools – hpaulj Oct 01 '20 at 16:03
3

An alternative way to get every pairwise combination of N elements is to generate the indices of the upper triangle of an (N, N) matrix using np.triu_indices(N, k=1), e.g.:

np.vstack(np.triu_indices(6, k=1)).T

For small arrays, itertools.combinations is going to win, but for large N the triu_indices trick can be substantially quicker:

In [1]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(6), 2)), np.int)
The slowest run took 10.46 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 4.04 µs per loop

In [2]: %timeit np.array(np.triu_indices(6, 1)).T
The slowest run took 10.97 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 22.3 µs per loop

In [3]: %timeit np.fromiter(itertools.chain.from_iterable(itertools.combinations(range(1000), 2)), np.int)
10 loops, best of 3: 69.7 ms per loop

In [4]: %timeit np.array(np.triu_indices(1000, 1)).T
100 loops, best of 3: 10.6 ms per loop
ali_m
  • 71,714
  • 23
  • 223
  • 298
  • I think that solution doesn't generate more than combination of two elements – stef_B. Oct 23 '15 at 07:23
  • Yes, I mentioned it because your original question was about combinations of two elements. I think it may be possible to generalize this approach to deal with combinations of more than two elements, but would require a bit more thought. – ali_m Oct 23 '15 at 08:05
  • I didn't know about the `chain.fromiterable`. For large cases it's twice as fast as `chain(*...)`. – hpaulj Oct 23 '15 at 17:14