0

I am having multiple list of words such as:

a = ['java','java 2','javascript','java']
b = ['finds','finding','afinder what','find']

And I want to kind 'name' each list. Until now I was using the following function that I made which was taking the most frequent word of the list:

def findMostFrequentWord(alist):
    listOfWords = ' '.join(alist).split(' ')
    mostFrequent = max(listOfWords,key=listOfWords.count)
    return mostFrequent

findMostFrequentWord(a)
'java'
findMostFrequentWord(b)
'finds'

But I was wondering if there is a way to have as output the highest rate of a sequence of letters. For example the list b should give as output the word find as the sequence of the letters f,i,n,d is the most frequent in the different values. I cant think of a methodology to check that and also seems to be kinda expensive from a computational perspective. Any comments?

Mpizos Dimitris
  • 4,819
  • 12
  • 58
  • 100
  • 1
    So to be clear, you search the longest string in the set of the most common substrings? – timgeb Feb 26 '16 at 10:50
  • @timgeb Initially my thought was 'the most common set of letters between different strings' , but by your question I realized that the probably the result is a single letter. So something like ' the most common set of letters between different strings with a threshold in the length of the sequence'. Probably there is a better way to 'name' a list of values but this is what it came to my mind. If you think that something else could work better just recommend it :) – Mpizos Dimitris Feb 26 '16 at 11:01

0 Answers0