3

I have a uniqueWordList with lots of words (100.000+). Trigrams of every one of those words are in the set allTriGrams.

I want to build a dictionary which has all the unique trigrams as keys and all the words which those trigrams can be matched with as values.

Example:

epicDict = {‘ban’:[‘banana’,’banned’],’nan’:[‘banana’]}

My code so far:

for value in allTriGrams:   
    for word in uniqueWordList:
        if value in word:
            epicDict.setdefault(value,[]).append(word)

My problem: This method takes a LOT of time. Is there any way to speed up this process?

klabanus
  • 75
  • 7

2 Answers2

2

What if uniqueWordList was a set instead, then you can do this instead:

if value in uniqueWordList:
    epicDict.setdefault(value,[]).append(word)

Check this out: Python Sets vs Lists

Community
  • 1
  • 1
idjaw
  • 25,487
  • 7
  • 64
  • 83
0

Among simple solutions, I expect this to be faster:

epicDict = collections.defaultdict(set)
for word in uniqueWordList:
  for trigram in [word[x:x+3] for x in range(len(word)-2)]:
    epicDict[trigram].add(word)
Julian Go
  • 4,442
  • 3
  • 23
  • 28