Find unique pairs of array with Python

Question

I'm searching for a pythonic way to do this operation faster

import numpy as np
von_knoten = np.array([0, 0, 1, 1, 1, 2, 2, 2, 3, 4])
zu_knoten =  np.array([1, 2, 0, 2, 3, 0, 1, 4, 1, 2])
try:
    for i in range(0,len(von_knoten)-1):
        for j in range(0,len(von_knoten)-1):
            if (i != j) & ([von_knoten[i],zu_knoten[i]] == [zu_knoten[j],von_knoten[j]]):
                    print(str(i)+".column equal " +str(j)+".column")
                    von_knoten = sp.delete(von_knoten , j)
                    zu_knoten = sp.delete(zu_knoten , j)
                    print(von_knoten)
                    print(zu_knoten)
except:
    print('end')

so I need the fastest way to get

[0 0 1 1 4]
[1 2 2 3 2]

from

[0 0 1 1 1 2 2 2 3 4]
[1 2 0 2 3 0 1 4 1 2]

Thanks ;)

The entries of the arrays are pairs. Like the first entry is 1 and 0. So 1 and 0 are Pairs, therefor i want to delete the third entry of both arrays because 0 and 1 is the same pair. — Search898, Nov 28 '17 at 12:28
Why are you using `try ... except`? Are you expecting an exception? And why are you using a bitwise operator `&` instead of the logical operator `and`? — Tom Karzes, Nov 28 '17 at 12:46
Also, are you aware that you're skipping the last array entries? Do you understand how ranges work in python? Hint: range(0, 5) is 0 through 4. It excludes 5. — Tom Karzes, Nov 28 '17 at 12:47
If speed is your concern, this is a classic example in which PyPy can give you a serious boost without touching the code. — alec_djinn, Nov 28 '17 at 12:53
@Divakar i was wondering too why [4,2] choosen over [2,4]... :/ — Search898, Nov 28 '17 at 13:17
It should be the first instance of the repeated sets that are returned? — Daniel F, Nov 28 '17 at 14:04
@Divakar yes it should be [2,4]. I've tested your code and it's really clever! How you find that way to code in sense of vectorized? I'm really fascinated! It's really realy fast — Search898, Dec 02 '17 at 20:18
@Search898 Just some experience working with NumPy helped I guess :) If your problem has been solved, don't forget to accept the one that helped the most. You can also upvote solutions. More info on those here - https://stackoverflow.com/help/someone-answers. — Divakar, Dec 02 '17 at 20:21
@DanielF I've not got your question. It should only find unique pairs without looking for the order — Search898, Dec 02 '17 at 20:21
@Search898 Just so that we are clear on what accepting solutions mean on Stackoverflow terminology - It's done by clicking on the hollow checkmark next to the solution. Since, you haven't done this on your questions, I thought this could be instructive. An example image view here - https://i.stack.imgur.com/LkiIZ.png. Apologies if you were aware and needed more time before accepting, which is perfectly normal. — Divakar, Dec 02 '17 at 20:31

score 1 · Answer 1 · answered Nov 28 '17 at 12:38

Some comments about your code; as-is, it does not do what you want, it shall print some stuff, did you even try to run it? Could you show us what you obtain?

first, simply do a range(len(von_knoten)); this will do what you want, as range starts at 0 by default, and ends one step before the end.
if you delete some items from the input lists, and try to access to items at end of them, you will likely obtain IndexErrors, this before exhausting the analysis of your input lists.
you do some sp.delete but we do not know what that is (neither do the code), this will raise AttributeErrors.
alas, please do not use except:. This will catch Exceptions you never dreamt of, and may explain why you don't understand what's wrong.

Then, what about using zip built-in function to obtain sorted two-dimensions tuples, and remove the duplicates ? Something like:

>>> von_knoten = [0, 0, 1, 1, 1, 2, 2, 2, 3, 4]
>>> zu_knoten =  [1, 2, 0, 2, 3, 0, 1, 4, 1, 2]
>>> set(tuple(sorted([m, n])) for m, n in zip(von_knoten, zu_knoten))
{(0, 1), (0, 2), (1, 2), (1, 3), (2, 4)}

I let you work around this to obtain the exact thing you're looking for.

hi, this is my result `0.column equal 2.column [0 0 1 1 2 2 2 3 4] [1 2 2 3 0 1 4 1 2] 1.column equal 4.column [0 0 1 1 2 2 3 4] [1 2 2 3 1 4 1 2] 2.column equal 4.column [0 0 1 1 2 3 4] [1 2 2 3 4 1 2] 3.column equal 5.column [0 0 1 1 2 4] [1 2 2 3 4 2] 5.column equal 4.column [0 0 1 1 4] [1 2 2 3 2] end` — Search898, Nov 28 '17 at 12:53

score 1 · Answer 2 · answered Nov 28 '17 at 12:38

1

You are trying to build up a collection of pairs you haven't seen before. You can use not in but need to check this either way round:

L = []
for x,y in zip(von_knoten, zu_knoten):
  if (x, y) not in L and (y, x ) not in L:
    L.append((x, y))

This gives a list of tuples

[(0, 1), (0, 2), (1, 2), (1, 3), (2, 4)]

which you can reshape.

answered Nov 28 '17 at 12:38

doctorlove

18,872
2
46
62

1

It will be faster (OP is concerned about performance) if `L` is a `set` (`in` and `not in` are `O(1)` operations on sets). This will, however, require to call `sorted` on `L` if OP cares about order. – DeepSpace Nov 28 '17 at 12:41
True - I went for simplest... but they did ask for fastest too. – doctorlove Nov 28 '17 at 12:49
@DeepSpace Calling `sorted` makes no sense. If order matters, then you simply maintain the set and the list in parallel, adding a new pair to both if it's missing from the set. There is no need for this to be anything other than O(n). – Tom Karzes Nov 28 '17 at 12:55

Divakar · Answer 3 · 2017-11-28T13:44:29.483

Here's a vectorized output -

def unique_pairs(von_knoten, zu_knoten):
    s = np.max([von_knoten, zu_knoten])+1
    p1 = zu_knoten*s + von_knoten
    p2 = von_knoten*s + zu_knoten
    p = np.maximum(p1,p2)
    sidx = p.argsort(kind='mergesort')
    ps = p[sidx]
    m = np.concatenate(([True],ps[1:] != ps[:-1]))
    sm = sidx[m]
    return von_knoten[sm],zu_knoten[sm]

Sample run -

In [417]: von_knoten = np.array([0, 0, 1, 1, 1, 2, 2, 2, 3, 4])
     ...: zu_knoten =  np.array([1, 2, 0, 2, 3, 0, 1, 4, 1, 2])

In [418]: unique_pairs(von_knoten, zu_knoten)
Out[418]: (array([0, 0, 1, 1, 2]), array([1, 2, 2, 3, 4]))

score 0 · Answer 4 · answered Nov 28 '17 at 14:11

Using np.unique and the void view method from here

def unique_pairs(a, b):
    c = np.sort(np.stack([a, b], axis = 1), axis = 1)
    c_view = np.ascontiguousarray(c).view(np.dtype((np.void,
                                          c.dtype.itemsize * c.shape[1])))
    _, i = np.unique(c_view, return_index = True)
    return a[i], b[i]

Find unique pairs of array with Python

4 Answers4