How to search for anagrams in O(logN) time given an input from the user?

Question

Hi I've been thinking of how to implement this for days.

I'm trying to implement a programme that reads a dictionary into a list and sorts it in O(N) time.After that, I have to search for the anagrams in the list given an input from the user in O(log N) time.I can sort each word by letter and sort the list alphabetically in O(N).

Since I'm trying to search in O(logN) time, I tried to use binary search by sorting every letter in each word and using that as a key to identify the anagrams. For example 'act' is the key for anagram group 'act,'cat','tac'.

arr=['act','cat','tac','bad','fad']

After sorting

[['act', 'act'], ['cat', 'act'], ['tac', 'act'], ['bad', 'abd'],  ['fad', 'adf']]

But binary search only finds ONE target so it will only return 'tac' for anagram group under 'act'.My binary search code:

def binarySearch(arr, lower, upper, target):
anagramList=[]
if upper >= lower:
    mid = lower + ((upper - lower) // 2)
    if areAnagrams(arr[mid][1],target):
        anagramList.append(arr[mid])
    elif arr[mid] > target:
        return binarySearch(arr, lower, mid - 1, target)
    else:
        return binarySearch(arr, mid + 1, upper, target)
return anagramList

I tried to group them like this

[['act','act','cat','tac'],['bad','abd'],['fad','daf]]

but it takes O(N^2) complexity which is larger than O(N)?Can anyone suggest how I should go about this?Thanks.

Edit: For example, if the query string is alppe, the output will consist of the words appel, and apple.

You probably want to use a list, right? If not, you could always resort to a dictionary. — MegaIng, Aug 10 '18 at 11:08
There's no partial matches allowed, right? You always need the full word? Can you supply some more inputs and expected outputs so we know exactly what you are trying to do. — Håken Lid, Aug 10 '18 at 11:08
@Megalng Well dictionary takes O(N) insert so worst case is O(N^2) so no — Sook Lim, Aug 10 '18 at 11:10
@SookLim How is the answer by HakenLid O(N^2)? Insert is O(N) and lookup is O(1). But maybe I am misunderstanding big-O notation. — MegaIng, Aug 10 '18 at 11:14
Ya but by inserting every word into the dictionary it takes O(N^2) time.Say for i in range(len(list)) insert into dict? — Sook Lim, Aug 10 '18 at 11:15

Håken Lid · Answer 1 · 2018-08-10T11:18:02.033

1

You can use a dictionary with the key being the word with letters sorted.

from collections import defaultdict
anagrams = defaultdict(list)

arr=['act','cat','tac','bad','fad']

for word in arr:
    anagrams[''.join(sorted(word))].append(word)

def get_anagram(user_input):
    return anagrams[''.join(sorted(user_input))]

Example:

>>> get_anagram('tca')
['act', 'cat', 'tac']

edited Aug 10 '18 at 11:18

answered Aug 10 '18 at 11:12

Håken Lid

22,318
9
52
67

Then the time complexity for reading and sorting the file wouldn't be O(N)? – Sook Lim Aug 10 '18 at 11:14
@SookLim The Python wiki states that insert is O(1) (worst case O(N)) and getitem the same: https://wiki.python.org/moin/TimeComplexity with n being the size of the container. Multply that with the size of the input array and you achive worst case O(n*n), you are correct, but that is unlikely. – MegaIng Aug 10 '18 at 11:20
If `N` is the length of the input `arr`, building the lookup hashmap is `O(n)`. If individual words are within some upper bound (from the example in the question, that should not be an unreasonable assumption), then we can assert sorting keys also have a upper limit constant time. – Håken Lid Aug 10 '18 at 11:20
@Megalng Sorry I meant worst case time complexity as O(N) – Sook Lim Aug 10 '18 at 11:21
@MegaIng: Why would you multiply insert and getitem operation? – Håken Lid Aug 10 '18 at 11:27
@HåkenLid Because you have to do a dict insert for every element of the input array. That is not quite correct with the multiplication, but I don't know how to do that correct. But it is not O(n^2), It could be something like O(n+log(n)) Or something like that. But most certainly it is O(n) – MegaIng Aug 10 '18 at 11:32
I'm not exactly sure how it's implemented, but I think we only need one get or insert operation. In any case, O(n) worst case performance for hash table is only a theoretical possibility. Typical case is O(1). https://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1?noredirect=1&lq=1 – Håken Lid Aug 10 '18 at 11:46

score 1 · Answer 2 · answered Aug 10 '18 at 11:15

You will need to use Counter from collections module. The Counter class is not hashable so we will make it hashable dict from it.

from collections import Counter, defaultdict


class hashablecounter(Counter):
    def __hash__(self):
        return hash(tuple(sorted(self.items())))


d = defaultdict(list)
arr=['act','cat','tac','bad','fad']

for a in arr:
    d[hashablecounter(a)].append(a)

s = 'cat'
print('Anagrams for ', s, ' are ', d[hashablecounter(s)])

How to search for anagrams in O(logN) time given an input from the user?

2 Answers2