2

I'm trying to check if 2 strings are anagrams. This solution is simple, but not efficient (Ologn) I know I could use Collections and Counter, then compare the occurrence of each character, but I'm trying to avoid any modules for an interview. What would be the fastest way to solve this problem? (Perhaps, checking occurrence of each character?)

def check(word1,word2):

    return sorted(word1)==sorted(word2)
Angular
  • 603
  • 2
  • 9
  • 24
  • 1
    Fast isn't even the issue here. Your code has multiple bugs in it, making it think "car" and "cat" are anagrams. – user2357112 Dec 08 '15 at 23:55
  • 3
    `if sorted(l1) == sorted(l2)` is , depending on how `sort` is done, probably `O(nlogn)` – R Nar Dec 08 '15 at 23:55
  • @user2357112 O dam true. Why is it doing that. Sorry i'm new. – Angular Dec 08 '15 at 23:56
  • @RNar Can you explain what you mean by "depanding on". is O(nlogn) faster than O(n)? – Angular Dec 08 '15 at 23:57
  • you are returning at your very first iteration. also, you should be using `zip`, not a nested `for` loop – R Nar Dec 08 '15 at 23:57
  • Possible duplicate: http://stackoverflow.com/questions/14990725/checking-strings-against-each-other-anagrams – drets Dec 09 '15 at 00:01
  • 1
    The O(nlogn) of the sort is the least of your problems. If you are only comparing 2 words - who cares? The O(n*n) you will have for finding anagrams in a collections of words is a more interesting problem if you ask me – John La Rooy Dec 09 '15 at 00:13
  • Even though sorted is O(nlogn), unless your strings are more than a million or so characters it's probably the fastest solution in Python anyway – John La Rooy Dec 09 '15 at 03:24

3 Answers3

3

Your code doesn't even return a correct value. This one-liner is O(n log n):

return sorted(word1) == sorted(word2)

For an O(n) solution, you can count all characters:

from collections import Counter
# ...
def check(a, b)
  return Counter(a) == Counter(b)

Without collections it is much longer:

def check(a, b):
    chars = dict.fromkeys(a + b, 0)
    for c in a:
        chars[c] += 1
    for c in b:
        chars[c] -= 1
    return not any(chars.values())

This code does the following:

  • chars = dict.fromkeys(a + b, 0): Creates a dict, which has all the occurring characters in either word as keys set to 0.
  • for c in a: chars[c] += 1: this will iterate over a and count the occurrences of each character in it. chars now contains the count of separate characters, (and some zeroes for characters in b but not a)
  • for c in b: chars[c] -= 1: much the same as before, but instead this will subtract the character counts of b from chars
  • return not any(chars.values()): chars['h'] == 0 if and only if a and b has the same amount of 'h'. This line checks if chars has only zeroes as values, meaning that all characters have the same count in both inputs. (as any returns if there is any truthy value in the sequence. 0 is falsy, every other integer is truthy.)

Both lists get iterated over once. Assuming O(1) access time for dictionaries makes the whole algorithm run in O(n) time (where n is the total length of the inputs). Space complexity is O(n) too (all characters can be distinct). Don't make that mistake when they ask you complexity. It's not necessary time complexity.

aschultz
  • 1,658
  • 3
  • 20
  • 30
Tamas Hegedus
  • 28,755
  • 12
  • 63
  • 97
  • 1
    from collections import Counter – timgeb Dec 09 '15 at 00:01
  • Ye my bad. I don't want to use collections though, I don't think the interviewer will like it. And the first solution isn't the fastest right? – Angular Dec 09 '15 at 00:03
  • @Angular: Internally python uses an O(n log n) algorithm for sorting, so yes, the first one is asymptotycally slower. – Tamas Hegedus Dec 09 '15 at 00:06
  • 1
    @Angular I don't know all the interviewers, but if I was one, I'd like people knowing the platform they're working on. Writing good algorithms is an important skill. Knowing the strengths of the platform you use is another. – Tamas Hegedus Dec 09 '15 at 00:16
  • You can simplify by initialising `chars = dict.fromkeys(a + b, 0)` – John La Rooy Dec 09 '15 at 00:24
  • @JohnLaRooy yeah, let me think about that for a moment – Tamas Hegedus Dec 09 '15 at 00:26
  • @JohnLaRooy I was thinking about building two dicts for each string and then compare those. That would still need an inner function – Tamas Hegedus Dec 09 '15 at 00:28
  • @JohnLaRooy I don't know, allocating a new string.. but avoiding get-defaults... or make two dicts... I dont think any of these would count in real life, complexity in big O remains the same either... I let you decide that - if you feel that's somewhat better, please edit it. – Tamas Hegedus Dec 09 '15 at 00:35
  • I already did :) adding all the keys at once is probably also more time efficient than one at a time and probably enough to make up for the extra string object – John La Rooy Dec 09 '15 at 02:10
  • Can someone explain step-by-step what's happening in the last code... let's say the a='cat' and b ='bob'.So you make a dict that looks like this {'a':0,'t':0,'c':0,'b':0,'o':0}. I don't get the rest... – Angular Dec 09 '15 at 06:02
2

Here's a nice option from http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/AnAnagramDetectionExample.html:

def anagramSolution(s1,s2):

    TABLE_SIZE = 128
    c1 = [0]*TABLE_SIZE
    c2 = [0]*TABLE_SIZE

    for ch in s1:
        pos = ord(ch)
        c1[pos] = c1[pos] + 1

    for ch in s2:
        pos = ord(ch)
        c2[pos] = c2[pos] + 1

    j = 0
    stillOK = True
    while j<TABLE_SIZE and stillOK:
        if c1[j]==c2[j]:
            j = j + 1
        else:
            stillOK = False

    return stillOK

This runs in O(n). Essentially, you loop over both strings, counting the occurrences of each letter. In the end, you can simply iterate over each letter, making sure the counts are equal.

As noted in the comments, this will have a harder time scaling for unicode. If you expect unicode, you would likely want to use a dictionary.

akosel
  • 1,183
  • 9
  • 14
  • 2
    I think this answer has value because it is the no-import homework solution. however, no need to keep looping once `stillOK` is `False`. You can simply return `False` at this point. – timgeb Dec 09 '15 at 00:04
  • Indeed. The loop will terminate once stillOk is set to False. See `while j<26 and stillOK:` – akosel Dec 09 '15 at 00:05
  • so ord is a built in function, that returns the ASCII character or each letter or something? – Angular Dec 09 '15 at 00:05
  • That is correct. Which raises a good point, this assumes the strings are lower case. If they were upper-case, it will not work as expected. You could make the array twice as big to account for upper-case letters, subtracting `ord('A')` from each. You could also use a dictionary, only storing the characters you encounter. In the end you would then have to loop over the keys. – akosel Dec 09 '15 at 00:11
  • requires modification even for capital letters – Tamas Hegedus Dec 09 '15 at 00:37
  • Yeah, that's true. I fixed that. Granted, if you wanted to support anything other than the OG ASCII chart, you would need to grow the size of the list. – akosel Dec 09 '15 at 00:39
  • What is the ord(s2[i])-ord('a') used for? – Angular Dec 09 '15 at 05:45
  • That is used to set the index to the ASCII value. It was based at ord('a'), which is 97. By doing this any instance of 'a' would be indexed at 0, 'b' at 1, and so on. I modified this to allow for more letters. As has been pointed out, there are quite a few unicode characters, so it is worth considering how this solution would scale. – akosel Dec 09 '15 at 14:42
0

I'd write it like this without imports:

def count_occurences(mystring):
    occs = {}
    for char in mystring:
        if char in occs:
            occs[char] += 1
        else:
            occs[char] = 1
    return occs

def is_anagram(str1, str2):
    return count_occurences(str1) == count_occurences(str2)

Or, if you can use imports, just not a Counter, use a defaultdict:

from collections import defaultdict

def count_occurences(mystring):
    occs = defaultdict(int)
    for char in mystring:
        occs[char] += 1

    return occs

def is_anagram(str1, str2):
    return count_occurences(str1) == count_occurences(str2)
timgeb
  • 76,762
  • 20
  • 123
  • 145