How to efficiently compare two unordered lists (not sets)?

Question

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]

a & b should be considered equal, because they have exactly the same elements, only in different order.

The thing is, my actual lists will consist of objects (my class instances), not integers.

what's the expected size of the real lists? Will the lists being compared be of comparable sizes or very different? Do you expect most lists to match or not? — Dmitry B., Oct 19 '11 at 22:23

score 357 · Accepted Answer · edited Aug 24 '16 at 08:17

357

O(n): The Counter() method is best (if your objects are hashable):

def compare(s, t):
    return Counter(s) == Counter(t)

O(n log n): The sorted() method is next best (if your objects are orderable):

def compare(s, t):
    return sorted(s) == sorted(t)

O(n * n): If the objects are neither hashable, nor orderable, you can use equality:

def compare(s, t):
    t = list(t)   # make a mutable copy
    try:
        for elem in s:
            t.remove(elem)
    except ValueError:
        return False
    return not t

edited Aug 24 '16 at 08:17

CrowbarKZ

1,203
11
18

answered Oct 19 '11 at 23:20

Raymond Hettinger

216,523
63
388
485

1

Thank you. I converted each object to a string then used the Counter() method. – johndir Oct 21 '11 at 22:28
Hey @Raymond, I recently encountered this question on an interview and I used `sorted()`, admittedly not knowing about `Counter`. The interviewer insisted there was a more efficient method and clearly I drew a blank. After extensive testing in python 3 with the `timeit` module, sorted consistently comes out faster on lists of integers. On lists of, 1k items, about 1.5% slower and on short lists, 10 items, 7.5% slower. Thoughts? – Arctelix Oct 20 '16 at 06:07
4

For short lists, big-O analysis is usually irrelevant because the timings are dominated by constant factors. For the longer lists, I suspect something is wrong with your benchmarking. For 100 ints with 5 repeats each, I get: 127 usec for sorted and 42 for Counter (about 3x faster). At 1,000 ints with 5 repeats, Counter is 4x faster. ``python3.6 -m timeit -s 'from collections import Counter' -s 'from random import shuffle' -s 't=list(range(100)) * 5' -s 'shuffle(t)' -s 'u=t[:]' -s 'shuffle(u)' 'Counter(t)==Counter(u)'`` – Raymond Hettinger Oct 20 '16 at 15:54
@Raymond Indeed we're getting different results. I posted my setup to a chat room `sorted vs counter`.. I'm very curious as to whats going on here. – Arctelix Oct 20 '16 at 17:01
We could continue this in a [chat] (http://chat.stackoverflow.com/rooms/126253/sorted-vs-counter) – Arctelix Oct 20 '16 at 18:36
6

No thanks. I don't have much interest in debugging spurious timing scripts. There is a lot going on here (pure python vs C code, timsort being applied to randomized data vs semi-ordered data, different implementation details across versions, how many duplicates are in the data, etc.) – Raymond Hettinger Oct 21 '16 at 18:03
@RaymondHettinger you state `Counter` method has `O(n)` but what about insertion of keys/values in the counter dictionaries? Well I suppose that makes it `O(n + log(n))` – Jean-François Fabre Sep 09 '17 at 21:12
@RaymondHettinger Can you recursively be able to solve using just the equality method? – John Strood Sep 02 '18 at 17:41
3

@Jean-FrançoisFabre: `O(n + log(n))` simplifies to `O(n)`, since `n` is a higher order of complexity. – StriplingWarrior Feb 18 '22 at 21:37

Mark Byers · Answer 2 · 2011-10-19T22:27:10.907

23

You can sort both:

sorted(a) == sorted(b)

A counting sort could also be more efficient (but it requires the object to be hashable).

>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

edited Oct 19 '11 at 22:27

answered Oct 19 '11 at 22:17

Mark Byers

811,555
193
1,581
1,452

The Counter does use hashing, but objects are not unhashable per se. You just have to implements a sensible `__hash__`, but that might be impossible for collections. – Jochen Ritzel Oct 19 '11 at 22:23
2

sorted won't work for everything either, eg complex numbers `sorted([0, 1j])` – John La Rooy Oct 19 '11 at 22:56
1

sorted() also doesn't work with sets where the comparison operators have been overridden for subset/superset tests. – Raymond Hettinger Oct 19 '11 at 23:10

score 15 · Answer 3 · answered Oct 19 '11 at 23:00

If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)

In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)

len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

cleder · Answer 4 · 2020-10-30T11:12:25.070

If you have to do this in tests: https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual

assertCountEqual(first, second, msg=None)

Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.

Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.

New in version 3.2.

or in 2.7: https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual

Outside of tests I would recommend the Counter method.

(What does this add to [jarekwg's answer](http://stackoverflow.com/a/34694504/3789665)?) — greybeard, Oct 13 '16 at 09:18

score 6 · Answer 5 · answered Jan 09 '16 at 14:13

6

If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).

Works on sequences of unhashable objects too.

answered Jan 09 '16 at 14:13

jarekwg

375
3
8

2

Wow, man! This is not intuitive - the name could suggest it's just `len(a) == len(b)` and not `MagicCounter(a) == MagicCounter(b)`, where `MagicCounter` is a `Counter` which can do unhashable objects... – Tomasz Gandor Dec 22 '20 at 19:06

kindall · Answer 6 · 2011-10-19T22:31:17.310

The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:

sorted(a) == sorted(b)

It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.

sorted(a, key=id) == sorted(b, key==id)

(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)

If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.

sorted(a, key=repr) == sorted(b, key==repr)

If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.

score 4 · Answer 7 · answered Jan 02 '17 at 21:45

If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:

from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
    print("Lists a and b contain the same objects")

score 1 · Answer 8 · edited May 23 '17 at 12:09

1

I hope the below piece of code might work in your case :-

if ((len(a) == len(b)) and
   (all(i in a for i in b))):
    print 'True'
else:
    print 'False'

This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.

For better understanding, refer to my answer in this question

edited May 23 '17 at 12:09

Community

1
1

answered Jan 22 '15 at 13:17

Pabitra Pati

457
4
12

score 1 · Answer 9 · answered Oct 19 '11 at 23:43

1

Let a,b lists

def ass_equal(a,b):
try:
    map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
    if len(a) == 0: # if a is empty, means that b has removed them all
        return True 
except:
    return False # b failed to remove some items from a

No need to make them hashable or sort them.

answered Oct 19 '11 at 23:43

Umur Kontacı

35,403
8
73
96

1

Yes but this is O(n**2) as several other posters stated, so should only be used if the other methods don't work. It also assumes `a` supports `pop` (is mutable) and `index` (is a sequence). Raymond's assumes neither while gnibbler's assumes only a sequence. – agf Oct 20 '11 at 00:05

Şahin Murat Oğur · Answer 10 · 2021-03-28T22:00:59.987

0

You can write your own function to compare the lists.

Let's get two lists.

list_1=['John', 'Doe'] 
list_2=['Doe','Joe']

Firstly, we define an empty dictionary, count the list items and write in the dictionary.

def count_list(list_items):
    empty_dict={}
    for list_item in list_items:
        list_item=list_item.strip()
        if list_item not in empty_dict:
            empty_dict[list_item]=1
        else:
            empty_dict[list_item]+=1
    return empty_dict

After that, we'll compare both lists by using the following function.

def compare_list(list_1, list_2):
    if count_list(list_1)==count_list(list_2):
        return True
    return False
compare_list(list_1,list_2)

edited Mar 28 '21 at 22:00

answered Mar 28 '21 at 12:49

Şahin Murat Oğur

128
7

sorry, I've typed wrongly the name of the empty_dict variable . I have corrected it. Thanks for the feedback! – Şahin Murat Oğur Mar 28 '21 at 21:24
(What if `compare_list()` just read `return count_list(list_1)==count_list(list_2)`?) – greybeard Mar 28 '21 at 21:38

Induane · Answer 11 · 2022-07-27T18:42:41.170

from collections import defaultdict

def _list_eq(a: list, b: list) -> bool:
    if len(a) != len(b):
        return False
    b_set = set(b)
    a_map = defaultdict(lambda: 0)
    b_map = defaultdict(lambda: 0)
    for item1, item2 in zip(a, b):
        if item1 not in b_set:
            return False
        a_map[item1] += 1
        b_map[item2] += 1
    return a_map == b_map

Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.

Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:

If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.

This allows you to short-circuit in many cases long before you've iterated both lists.

`_list_eq([0, 0, 1] [0, 1, 1])` – greybeard Jul 27 '22 at 17:51 — greybeard, Jul 27 '22 at 17:51

score 0 · Answer 12 · answered Aug 12 '22 at 17:55

plug in this:

def lists_equal(l1: list, l2: list) -> bool:
    """

    import collections
    compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
    ref:
        - https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
        - https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
    """
    compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
    set_comp = set(l1) == set(l2)  # removes duplicates, so returns true when not sometimes :(
    multiset_comp = compare(l1, l2)  # approximates multiset
    return set_comp and multiset_comp  #set_comp is gere in case the compare function doesn't work

How to efficiently compare two unordered lists (not sets)?

12 Answers12

Linked

Related