How to generate all possible pairs of coordinates without repetition in numpy efficiently

Question

I am trying to generate all pairs of coordinates for pixels in an image with colors, without the pairs repeating (order doesn't matter, so ((1,1,1), (2,2,2) is the same as ((2,2,2), (1,1,1)) and we want to include this pair only once). Also it's important to me that the coordinates are stored in a numpy array.

Let's assume i have a 10x10 image. This means that the image has 100 pixels with 3 color channels what equals to 300 coordinates. This gives us 300*299/2 unique pairs of coordinates. Both using itertools.combinations() or normal python iteration , and then converting to np.array, is painstakingly slow with bigger images (on my pc for 32x32x3 image it takes 5s).

I am able to create a list of all pixels using

all_pixels = np.array(np.meshgrid(range(10), range(10), range(3))).T.reshape(-1, 3)

but that's because we don't have to consider repetitions. Doing that but trying to create pairs of pixels gives me duplicates. I guess i could remove duplicates in some smart way but i have no idea how to do it in an efficient way.

Any help will be greatly appreciated.

It's a bit crude , but for reference this is how i do this now:

    start = time.time()
    x, y, z = shape
    all_pixels = []
    for i in range(x):
        for j in range(y):
            if z > 1:
                for k in range(z):
                    all_pixels.append([i, j, k])
            else:
                all_pixels.append([i, j])
    first_pix = []
    second_pix = []
    for i in range(len(all_pixels)):
        first = all_pixels[i]
        for j in all_pixels[i+1:]:
            second = j
            first_pix.append(first)
            second_pix.append(second)
    print("generation of pixels took " + str(time.time() - start))
    return np.array(first_pix), np.array(second_pix)

There is a missing `)` before the `.T.reshape(-1, 3)` @Divakar — Adirio, Aug 27 '19 at 08:29
`all_pixels = np.array(np.meshgrid(range(10), range(10), range(3))).T.reshape(-1, 3)` — Adirio, Aug 27 '19 at 08:32
Sackhorn, do you really need to include the channel as a third dimension to the coordinates? I think you can take into account the channels after doing the pairing as the number of channels will be constant for every pixel of both images while the width and height won't. This will make your problem a bit faster as you are dividing by 3 the number of coordinates. It may not seem much but to creating pairs is O(n^2) and if you want to remove duplicates it will be O(n^3), so dividing n by 3 is like dividing the complexity by 27. — Adirio, Aug 27 '19 at 08:38
It would be nice to have channel as a third dimension, but i can work around that so they are not necessary. — Sackhorn, Aug 27 '19 at 08:43
If you add the color channel later the number of combinations will not be `(300 * 299) / 2` but rather `(100 * 99) / 2)`. — Laurens Koppenol, Aug 27 '19 at 08:46
Hey i don't. I will look at it later today, cause i am at work — Sackhorn, Aug 27 '19 at 13:50

Paul Panzer · Accepted Answer · 2019-08-27T09:08:28.873

Here is a straightforward numpy method, not sure how fast it is:

shape = 10,10,3
np.stack([*map(np.transpose, map(np.unravel_index, np.triu_indices(np.prod(shape),1), 2*(shape,)))],-2)

Output:

array([[[0, 0, 0],
        [0, 0, 1]],

       [[0, 0, 0],
        [0, 0, 2]],

       [[0, 0, 0],
        [0, 1, 0]],

       ...,

       [[9, 9, 0],
        [9, 9, 1]],

       [[9, 9, 0],
        [9, 9, 2]],

       [[9, 9, 1],
        [9, 9, 2]]])

Update: Same idea, same result but faster

np.column_stack(np.unravel_index(np.arange(np.prod(shape)),shape))[np.column_stack(np.triu_indices(np.prod(shape),1))]

Looks great. Let me check your answer and i will accept it. – Sackhorn Aug 27 '19 at 09:14 — Sackhorn, Aug 27 '19 at 09:14

Laurens Koppenol · Answer 2 · 2019-08-27T09:09:33.163

Below example is cooler but slower than OP's solution :(

%%timeit
first_pix = []
second_pix = []
for i in range(len(pixels)):
    first = pixels[i]
    for j in pixels[i+1:]:
        second = j
        first_pix.append(first)
        second_pix.append(second)

3.57 ms ± 59.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
mixed = {frozenset((i, j)) for i in pixels for j in pixels if i != j}

36.5 ms ± 1.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Code

Create a list of all pixels, but we use tuples so they are hashable. We will see why later.

img_width = 10
img_height = 10
img_colors = 3

pixels = [(x, y, c) for x in range(img_width) for y in range(img_height) for c in range(3)]

Now we use sets to make sure we have no duplicates.

mixed = {frozenset((i, j)) for i in pixels for j in pixels if i != j}

Now we check that it has the correct number of values:

>>> desired_length = (len(pixels) * (len(pixels) - 1)) / 2
>>> assert len(mixed) == desired_length
True

Explanation

We use a 2 dimensional set comprehension to create permutations. It has the following format:

{(x, y) for x in xs for y in ys}

Because this is a set all items in it will be unique. This requires everything in the set to be hashable i.e. comparable to eachother for python

Because we do not only want the pixel combinations to be unique, we also want them to be order indepentently unique. Therefore we use a set again, but since a normal set is not hashable we use the internal type frozenset. Which is now actually a set of tuples. Which is hashable and order independent.

>>> frozenset([(0, 0, 1), (0, 0 , 2)]) == frozenset([(0, 0, 2), (0, 0 , 1)])
True

We have to add i != j to make sure we do not enter the same coordinates in a frozenset twice, resulting in an outcome of length 1. (for example set([(0, 0, 1), (0, 0, 1)]) equals {(0, 0, 1)}

>>> frozenset([(0, 0, 1), (0, 0 , 1)])
frozenset({(0, 0, 1)})

Full code

%%timeit

img_width = 10
img_height = 10
img_colors = 3

pixels = [(x, y, c) for x in range(img_width) for y in range(img_height) for c in range(3)]
mixed = {frozenset((i, j)) for i in pixels for j in pixels if i != j}

38.2 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Your code works properly, but the issue is that a normal python for loop and using python collection to generate all pairs takes at my pc around 5s for 32x32x3 with converting the python collection to np.array. Your method takes around 9s without conversion. — Sackhorn, Aug 27 '19 at 08:57
Are you sure? On my laptop, not state of the art, I get `35.7 ms ± 544 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)` — Laurens Koppenol, Aug 27 '19 at 09:01
Well my pc isn't great either. Let me check once again. But i am pretty sure. I posted the way i run it now in my question. — Sackhorn, Aug 27 '19 at 09:04
your own method actually takes `3 ms` on my laptop.... updated answer to reflect this — Laurens Koppenol, Aug 27 '19 at 09:10

How to generate all possible pairs of coordinates without repetition in numpy efficiently

2 Answers2

Below example is cooler but slower than OP's solution :(

Code

Explanation

Full code