Comparing complexity of comparing Python'c counter objects with sorted lists

Question

I have two versions of logic for finding out the anagrams in a list of words:

The first one uses Python's collections module and uses Counter :

def anagram_checker(cls, word=None, words=[]):

    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = Counter(word)
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if Counter(given_word) == counter_word:
                    anagrams.append(given_word)
    return anagrams

The second one has same functionality but using the sorted builtin function.

def anagram_checker(cls, word=None, words=[]):

    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = list(sorted(word))
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if list(sorted(given_word)) == counter_word:
                    anagrams.append(given_word)
    return anagrams

Which has better time complexity . I mean is comparing Python Counter objects has better complexity or comparing sorted lists has better time complexity?

If i am not wrong comparing lists have a complexity of O(n) right . What is the complexity of comparing two Counter objects.

I have searched various documentation but have not found any satisfactory answer for the same.

Kindly help.

well, you are sorting and creating new list in for loop in example 2, so I would bet the example 1 is more efficient (Counter is implemented by dicts). — Andrej Kesely, Aug 01 '18 at 16:59
Comparing counters and comparing lists are both O(n). However, creating a counter is also O(n), which is better than sorting, which is O(n log n). — zvone, Aug 01 '18 at 17:49

Andrej Kesely · Accepted Answer · 2018-08-01T18:31:16.360

I did some measurement, and although comparing lists and Counters are both O(n) and creating Counter is O(n), which is better than sorting O(n.log n), the anagram_checker with sorting is faster.

from timeit import timeit
from collections import Counter

def anagram_checker_1(word=None, words=[]):
    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = Counter(word)
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if Counter(given_word) == counter_word:
                    anagrams.append(given_word)
    return anagrams


def anagram_checker_2(word=None, words=[]):
    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = list(sorted(word))
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if list(sorted(given_word)) == counter_word:
                    anagrams.append(given_word)
    return anagrams

print(timeit("anagram_checker_1('battle', ['battet', 'batlet', 'battel', 'tablet'])", globals=globals(), number=100_000))
print(timeit("anagram_checker_2('battle', ['battet', 'batlet', 'battel', 'tablet'])", globals=globals(), number=100_000))

Output:

2.3342012430075556
0.47786532100872137

Profiling the anagram 1 shows this:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   500000    0.662    0.000    2.512    0.000 /usr/lib/python3.6/collections/__init__.py:517(__init__)
   500000    0.501    0.000    0.821    0.000 /usr/lib/python3.6/abc.py:180(__instancecheck__)
   500000    0.438    0.000    1.808    0.000 /usr/lib/python3.6/collections/__init__.py:586(update)
   100000    0.433    0.000    2.978    0.000 example.py:4(anagram_checker_1)
  1000006    0.320    0.000    0.320    0.000 /usr/lib/python3.6/_weakrefset.py:70(__contains__)
   500000    0.283    0.000    0.283    0.000 {built-in method _collections._count_elements}
   500002    0.225    0.000    1.047    0.000 {built-in method builtins.isinstance}
  1100000    0.090    0.000    0.090    0.000 {built-in method builtins.len}
        1    0.042    0.042    3.020    3.020 <timeit-src>:2(inner)
   300000    0.025    0.000    0.025    0.000 {method 'append' of 'list' objects}

So it's clearly to see, that Python's overhead in creating Counter objects overrides here any algorithmic complexity advantage.

EDIT:

Profiling report from Anagram 2, for comparison:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   500000    0.372    0.000    0.372    0.000 {built-in method builtins.sorted}
   100000    0.353    0.000    0.762    0.000 example.py:18(anagram_checker_2)
        1    0.041    0.041    0.803    0.803 <timeit-src>:2(inner)
   300000    0.028    0.000    0.028    0.000 {method 'append' of 'list' objects}
   100000    0.009    0.000    0.009    0.000 {built-in method builtins.len}

Thank you so much . Just one small question here. How does Python do dict comparison ? Can you give me some idea how it works behind the scenes? I can understand list compare takes O(n) times since we have to compare each element. But how does it work behind the scenes for Counter objects. Many thanks — Subhayan Bhattacharya, Aug 01 '18 at 18:54
@user1867151 Counter objects are basically dicts, and what Python does when comparing dicts - https://stackoverflow.com/questions/17217225/what-does-the-operator-actually-do-on-a-python-dictionary — Andrej Kesely, Aug 01 '18 at 18:58

Comparing complexity of comparing Python'c counter objects with sorted lists

1 Answers1