0

I want to get all unique combinations of a numpy.array vector (or a pandas.Series). I used itertools.combinations but it's very slow. For an array of size (1000,) it takes many hours. Here is my code using itertools (actually I use combination differences):

def a(array):
    temp = pd.Series([])
    for i in itertools.combinations(array, 2):
        temp = temp.append(pd.Series(np.abs(i[0]-i[1])))
    temp.index=range(len(temp))
    return temp

As you see there is no repetition!! The sklearn.utils.extmath.cartesian is really fast and good but it provides repetitions which I do not want! I need help rewriting above function without using itertools and much more speed for large vectors.

Community
  • 1
  • 1
kasra545
  • 37
  • 2
  • 7
  • Is this what you're after: http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays if so I will close as a duplicate – EdChum Nov 09 '15 at 15:35
  • No because It doesn't provide unique combinations! itertools itself is very slow! – kasra545 Nov 09 '15 at 15:38
  • ensure the array is unque to start and mask out values which are equal (to themselves / maybe add them back in as a special case)? – Andy Hayden Nov 09 '15 at 15:43
  • If you mean vector array, it is unique because it's elements are random float64 numbers. Actually I do not understand your suggestion! – kasra545 Nov 09 '15 at 15:51
  • Another possible solution(s) to your problem is http://stackoverflow.com/questions/11144513/numpy-cartesian-product-of-x-and-y-array-points-into-single-array-of-2d-points – Sergey Bushmanov Nov 09 '15 at 16:02

2 Answers2

3

You could take the upper triangular part of a matrix formed on the Cartesian product with the binary operation (here subtraction, as in your example):

import numpy as np
n = 3
a = np.random.randn(n)
print(a)
print(a - a[:, np.newaxis])
print((a - a[:, np.newaxis])[np.triu_indices(n, 1)])

gives

[ 0.04248369 -0.80162228 -0.44504522]
[[ 0.         -0.84410597 -0.48752891]
 [ 0.84410597  0.          0.35657707]
 [ 0.48752891 -0.35657707  0.        ]]
[-0.84410597 -0.48752891  0.35657707]

with n=1000 (and output piped to /dev/null) this runs in 0.131s on my relatively modest laptop.

Rory Yorke
  • 2,166
  • 13
  • 13
2

For a random array of ints:

import numpy as np
import pandas as pd
import itertools as it

b = np.random.randint(0, 8, ((6,)))
# array([7, 0, 6, 7, 1, 5])
pd.Series(list(it.combinations(np.unique(b), 2)))

it returns:

0    (0, 1)
1    (0, 5)
2    (0, 6)
3    (0, 7)
4    (1, 5)
5    (1, 6)
6    (1, 7)
7    (5, 6)
8    (5, 7)
9    (6, 7)
dtype: object
Jaroslav Bezděk
  • 6,967
  • 6
  • 29
  • 46
Lee
  • 29,398
  • 28
  • 117
  • 170