204

Sorry for the simple question, but I'm having a hard time finding the answer.

When I compare 2 lists, I want to know if they are "equal" in that they have the same contents, but in different order.

Ex:

x = ['a', 'b']
y = ['b', 'a']

I want x == y to evaluate to True.

juliomalegria
  • 24,229
  • 14
  • 73
  • 89
toofly
  • 2,167
  • 3
  • 15
  • 9

4 Answers4

270

You can simply check whether the multisets with the elements of x and y are equal:

import collections
collections.Counter(x) == collections.Counter(y)

This requires the elements to be hashable; runtime will be in O(n), where n is the size of the lists.

If the elements are also unique, you can also convert to sets (same asymptotic runtime, may be a little bit faster in practice):

set(x) == set(y)

If the elements are not hashable, but sortable, another alternative (runtime in O(n log n)) is

sorted(x) == sorted(y)

If the elements are neither hashable nor sortable you can use the following helper function. Note that it will be quite slow (O(n²)) and should generally not be used outside of the esoteric case of unhashable and unsortable elements.

def equal_ignore_order(a, b):
    """ Use only when elements are neither hashable nor sortable! """
    unmatched = list(b)
    for element in a:
        try:
            unmatched.remove(element)
        except ValueError:
            return False
    return not unmatched
phihag
  • 278,196
  • 72
  • 453
  • 469
  • 3
    `equal_ignore_order` is a nice approach. I think, it could be improved by checking equalitiy of the lengths of `a` and `b` first. This will speeding things up a little (or a lot, depending on the input). – Jonathan Scholbach Jun 22 '21 at 07:22
  • set(x) == set(y) does not work just by itself. It would work for the specific example of this question, but it could be buggy for other cases, such as x = ['a', 'b', 'a'], y = ['b', 'a']. For this case, set(x) == set(y) would return True, but someone could argue it to be False. We would have to add len(x) == len(y) for it to fully work. However, we don´t have this issue with sorted(x) == sorted(y) – Arman Mojaver Jan 07 '23 at 10:29
  • @ArmanMojaver the answer explicitly mentions that the elements must be unique. Also, no, comparing lengths will not matter if there are duplicates. Consider the following scenario: x = ['a', 'a', 'b'], y = ['a', 'b', 'b'] Both set(x) == set(y) and len(x) == len(y) are True, yet we can clearly see that x != y. – AboodXD Apr 05 '23 at 10:43
42

Determine if 2 lists have the same elements, regardless of order?

Inferring from your example:

x = ['a', 'b']
y = ['b', 'a']

that the elements of the lists won't be repeated (they are unique) as well as hashable (which strings and other certain immutable python objects are), the most direct and computationally efficient answer uses Python's builtin sets, (which are semantically like mathematical sets you may have learned about in school).

set(x) == set(y) # prefer this if elements are hashable

In the case that the elements are hashable, but non-unique, the collections.Counter also works semantically as a multiset, but it is far slower:

from collections import Counter
Counter(x) == Counter(y)

Prefer to use sorted:

sorted(x) == sorted(y) 

if the elements are orderable. This would account for non-unique or non-hashable circumstances, but this could be much slower than using sets.

Empirical Experiment

An empirical experiment concludes that one should prefer set, then sorted. Only opt for Counter if you need other things like counts or further usage as a multiset.

First setup:

import timeit
import random
from collections import Counter

data = [str(random.randint(0, 100000)) for i in xrange(100)]
data2 = data[:]     # copy the list into a new one

def sets_equal(): 
    return set(data) == set(data2)

def counters_equal(): 
    return Counter(data) == Counter(data2)

def sorted_lists_equal(): 
    return sorted(data) == sorted(data2)

And testing:

>>> min(timeit.repeat(sets_equal))
13.976069927215576
>>> min(timeit.repeat(counters_equal))
73.17287588119507
>>> min(timeit.repeat(sorted_lists_equal))
36.177085876464844

So we see that comparing sets is the fastest solution, and comparing sorted lists is second fastest.

Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
1

This seems to work, though possibly cumbersome for large lists.

>>> A = [0, 1]
>>> B = [1, 0]
>>> C = [0, 2]
>>> not sum([not i in A for i in B])
True
>>> not sum([not i in A for i in C])
False
>>> 

However, if each list must contain all the elements of other then the above code is problematic.

>>> A = [0, 1, 2]
>>> not sum([not i in A for i in B])
True

The problem arises when len(A) != len(B) and, in this example, len(A) > len(B). To avoid this, you can add one more statement.

>>> not sum([not i in A for i in B]) if len(A) == len(B) else False
False

One more thing, I benchmarked my solution with timeit.repeat, under the same conditions used by Aaron Hall in his post. As suspected, the results are disappointing. My method is the last one. set(x) == set(y) it is.

>>> def foocomprehend(): return not sum([not i in data for i in data2])
>>> min(timeit.repeat('fooset()', 'from __main__ import fooset, foocount, foocomprehend'))
25.2893661496
>>> min(timeit.repeat('foosort()', 'from __main__ import fooset, foocount, foocomprehend'))
94.3974742993
>>> min(timeit.repeat('foocomprehend()', 'from __main__ import fooset, foocount, foocomprehend'))
187.224562545
blahreport
  • 1,000
  • 6
  • 10
  • 2
    Should not be a surprise as your method is O(N^2), that is much much bigger than O(N) or O(N * log N). For every element of B (N elements) it is checking all the elements of A (N elements). The number of checks is then N * N. – RobMcZag Mar 03 '16 at 18:52
0

As mentioned in comments above, the general case is a pain. It is fairly easy if all items are hashable or all items are sortable. However I have recently had to try solve the general case. Here is my solution. I realised after posting that this is a duplicate to a solution above that I missed on the first pass. Anyway, if you use slices rather than list.remove() you can compare immutable sequences.

def sequences_contain_same_items(a, b):
    for item in a:
        try:
            i = b.index(item)
        except ValueError:
            return False
        b = b[:i] + b[i+1:]
    return not b
Grahame
  • 11
  • 1