0

I have two versions of logic for finding out the anagrams in a list of words:

  1. The first one uses Python's collections module and uses Counter :

def anagram_checker(cls, word=None, words=[]):

    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = Counter(word)
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if Counter(given_word) == counter_word:
                    anagrams.append(given_word)
    return anagrams
  1. The second one has same functionality but using the sorted builtin function.

def anagram_checker(cls, word=None, words=[]):

    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = list(sorted(word))
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if list(sorted(given_word)) == counter_word:
                    anagrams.append(given_word)
    return anagrams

Which has better time complexity . I mean is comparing Python Counter objects has better complexity or comparing sorted lists has better time complexity?

If i am not wrong comparing lists have a complexity of O(n) right . What is the complexity of comparing two Counter objects.

I have searched various documentation but have not found any satisfactory answer for the same.

Kindly help.

Patrick Haugh
  • 59,226
  • 13
  • 88
  • 96
Subhayan Bhattacharya
  • 5,407
  • 7
  • 42
  • 60
  • 1
    well, you are sorting and creating new list in for loop in example 2, so I would bet the example 1 is more efficient (Counter is implemented by dicts). – Andrej Kesely Aug 01 '18 at 16:59
  • Comparing counters and comparing lists are both O(n). However, creating a counter is also O(n), which is better than sorting, which is O(n log n). – zvone Aug 01 '18 at 17:49

1 Answers1

1

I did some measurement, and although comparing lists and Counters are both O(n) and creating Counter is O(n), which is better than sorting O(n.log n), the anagram_checker with sorting is faster.

from timeit import timeit
from collections import Counter

def anagram_checker_1(word=None, words=[]):
    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = Counter(word)
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if Counter(given_word) == counter_word:
                    anagrams.append(given_word)
    return anagrams


def anagram_checker_2(word=None, words=[]):
    anagrams = []

    if len(words) == 0:  # if the second arg list is empty then there is nothing to check
        return anagrams
    else:
        counter_word = list(sorted(word))
        for given_word in words:
            if given_word != word:  # Cannot be same as the word in question
                if list(sorted(given_word)) == counter_word:
                    anagrams.append(given_word)
    return anagrams

print(timeit("anagram_checker_1('battle', ['battet', 'batlet', 'battel', 'tablet'])", globals=globals(), number=100_000))
print(timeit("anagram_checker_2('battle', ['battet', 'batlet', 'battel', 'tablet'])", globals=globals(), number=100_000))

Output:

2.3342012430075556
0.47786532100872137

Profiling the anagram 1 shows this:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   500000    0.662    0.000    2.512    0.000 /usr/lib/python3.6/collections/__init__.py:517(__init__)
   500000    0.501    0.000    0.821    0.000 /usr/lib/python3.6/abc.py:180(__instancecheck__)
   500000    0.438    0.000    1.808    0.000 /usr/lib/python3.6/collections/__init__.py:586(update)
   100000    0.433    0.000    2.978    0.000 example.py:4(anagram_checker_1)
  1000006    0.320    0.000    0.320    0.000 /usr/lib/python3.6/_weakrefset.py:70(__contains__)
   500000    0.283    0.000    0.283    0.000 {built-in method _collections._count_elements}
   500002    0.225    0.000    1.047    0.000 {built-in method builtins.isinstance}
  1100000    0.090    0.000    0.090    0.000 {built-in method builtins.len}
        1    0.042    0.042    3.020    3.020 <timeit-src>:2(inner)
   300000    0.025    0.000    0.025    0.000 {method 'append' of 'list' objects}

So it's clearly to see, that Python's overhead in creating Counter objects overrides here any algorithmic complexity advantage.

EDIT:

Profiling report from Anagram 2, for comparison:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   500000    0.372    0.000    0.372    0.000 {built-in method builtins.sorted}
   100000    0.353    0.000    0.762    0.000 example.py:18(anagram_checker_2)
        1    0.041    0.041    0.803    0.803 <timeit-src>:2(inner)
   300000    0.028    0.000    0.028    0.000 {method 'append' of 'list' objects}
   100000    0.009    0.000    0.009    0.000 {built-in method builtins.len}
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Thank you so much . Just one small question here. How does Python do dict comparison ? Can you give me some idea how it works behind the scenes? I can understand list compare takes O(n) times since we have to compare each element. But how does it work behind the scenes for Counter objects. Many thanks – Subhayan Bhattacharya Aug 01 '18 at 18:54
  • @user1867151 Counter objects are basically dicts, and what Python does when comparing dicts - https://stackoverflow.com/questions/17217225/what-does-the-operator-actually-do-on-a-python-dictionary – Andrej Kesely Aug 01 '18 at 18:58
  • Thanks a million ! – Subhayan Bhattacharya Aug 01 '18 at 19:12