6

suppose you have a list of tuples in a python set:

>>> pairs = set( [(0,1),(0,1),(1,0)] )
>>> print pairs
set([(0, 1), (1, 0)])

Obviously, the first two elements are duplicates and according to the definition of a set, "pairs" only holds unique elements.

However, in my particular case the tuple (i,j) defines an interaction pair. Thus, (i,j) and (j,i) are the same. I need an effective way to reduce all duplicate elements. Computation time is crucial for me since the total set could easily contain a number of element as large as 10**6. I expect the following result:

>>> pairs = set( [(0,1),(0,1),(1,0)] )
>>> pairs = remove_duplicate_interactions(pairs)
>>> print pairs
set([0,1]) or set([1,0])

I am thankful for any hint.

Edit:

Someone asked about the context. This is supposed to be used for particle simulations. Due to symmetry conditions, the force of particle i acting on j is the same as the force of j acting on i. The reduction of computation time is thus 50 %.

Rakulan S.
  • 303
  • 1
  • 2
  • 9

5 Answers5

10

How about:

In [4]: pairs = set( [(0,1),(0,1),(1,0),(1,2),(1,0),(2,1)] )

In [5]: set((a,b) if a<=b else (b,a) for a,b in pairs)
Out[5]: set([(0, 1), (1, 2)])
NPE
  • 486,780
  • 108
  • 951
  • 1,012
4
>>> pairs = [(0,1),(0,1),(1,0)]
>>> set(tuple(sorted(p)) for p in pairs)
set([(0, 1)])
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
3

I would subtype tuple and always make the pair ordered:

class InteractionPair(tuple):
    def __new__(cls, a, b):
        if a <= b:
            return tuple.__new__(cls, (a, b))
        return tuple.__new__(cls, (b, a))

Example:

>>> set([InteractionPair(0, 1), InteractionPair(1, 0)])
set([(0, 1)])

This can also be applied to a list of standard tuples:

from itertools import starmap
pairs = [(0, 1), (0, 1), (1, 0)]
print set(starmap(InteractionPair, pairs))

Edit: Her are some timings, using a list of 1000000 random pairs:

In [14]: timeit set(starmap(InteractionPair, pairs))
1 loops, best of 3: 742 ms per loop

In [15]: timeit set((a,b) if a<=b else (b,a) for a,b in pairs)
1 loops, best of 3: 258 ms per loop

In [16]: timeit set(tuple(sorted(p)) for p in pairs)
1 loops, best of 3: 1.02 s per loop

As far as performance is concerned, the solution by aix wins.

Community
  • 1
  • 1
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • @larsmans: Not sure if it really is. This approach might sometimes be useful, but you would need to know more about the context to judge if this is really worth it. – Sven Marnach Feb 29 '12 at 16:18
  • I like object-oriented approaches, so this seems quite elegant to me too. But how about efficiency ? Intuitively, this seems quite computation heavy ?! – Rakulan S. Feb 29 '12 at 16:20
  • @RakulanS.: Added a few simple timings. It's not that bad. – Sven Marnach Feb 29 '12 at 16:25
2

If you can use frozenset instead of tuples, you can store each interaction pair as a set. An alternative is to store all your tuples in some canonical form (e.g., so that i < j). No idea if this has a speed advantage over frozenset, but the simpler the form, the faster python will be able to hash it and compute the set of unique elements.

alexis
  • 48,685
  • 16
  • 101
  • 161
0
pairs = set(frozenset(p) for p in ((0,1),(0,1),(1,0)) )

Will achieve what you need. See my answer here for further information: How do I perform deep equality comparisons of two lists of tuples?

If necessary, you can reconstruct these as pairs afterwards.

Community
  • 1
  • 1
Marcin
  • 48,559
  • 18
  • 128
  • 201