Detect implicit substring patterns in python

Question

Given two words I want to identify the common parts of it.

For example given the two words "technology learning TEL" and "learning TEL approach" I want to identify the common terms learning TEL.

Another example, lightweight web applications and software web applications, common terms are web applications

My current code uses in as follows.

for item1 in mylist_1:
    for item2 in mylist_2:
        if item2 in item1:
            tmp_mylist1.append(item2)
            break

print(tmp_mylist1)

However, it fails to identify implicit word phrases as I have mentioned above in the example.

if "technology learning TEL" in "learning TEL approach":
    print("done")
else:
    print("no")

Hence, is there any fastest way of identifying these implicit common consecutive terms in python?

I'm not sure if I understand what you mean - are you just trying to find common words in two sentences? To have a function like `common("aaa bbb ccc", "ddd aaa ccc") == ["aaa", "ccc"]`? — Dunno, Dec 28 '17 at 10:54
@Dunno that wouldn't work, because had the second example string of yours been `"ddd aaa bbb"` the OP wants the function to yield `"aaa bbb"`, not `"aaa", "bbb"`. — sjaustirni, Dec 28 '17 at 10:56
Please, check this: https://stackoverflow.com/questions/18715688/find-common-substring-between-two-strings — Fernando Ortega, Dec 28 '17 at 11:20
Do I understand correctly that the problem is [this](https://en.wikipedia.org/wiki/Longest_common_substring_problem)? — timgeb, Dec 28 '17 at 11:27
You want a suffix tree. The naive brute-force approach quickly crumbles on any non-toy input. — tripleee, Dec 28 '17 at 11:40

Arount · Accepted Answer · 2017-12-28T11:34:36.723

It surely exists quicker way to do that, but since nobody yet replied here is a solution:

import itertools

def best_combination(string1, string2):
    '''
    Gives best words combinations within both strings
    '''
    words = string1.split()
    # All possible solutions for a case
    solutions = []

    # Loop to increment number of words combination to test
    for i in range(1, len(words) + 1):
        # get all possible combinations according to current number of words to test
        possibilities = list(itertools.combinations(words, i))

        # test all possiblities
        for possibility in possibilities:
            tested_string = ' '.join(possibility)

            # If it match, add it to solutions list
            if tested_string in string2:
                solutions.append(tested_string)

    # Best solution is the longest
    solutions.sort(key=len, reverse=True)
    return solutions[0]


print(best_combination('technology learning TEL', 'learning TEL approach'))
print(best_combination('aaa bbb ccc', 'bbb ccc'))
print(best_combination('aaa bbb ccc', 'aaa bbb ccc'))
print(best_combination('aaa bbb ccc', 'ccc bbb'))

Output:

learning TEL
bbb ccc
aaa bbb ccc
bbb

More about itertools.combinations

EDIT

Same thing, less lines, more one-liners:

def best_combination(string1, string2):
    '''
    Gives best words combinations within both strings
    '''
    words = string1.split()
    solutions = []

    tests = sum([list(itertools.combinations(words, i)) for i in range(1, len(words) + 1)], [])
    for test in tests:
        if ' '.join(test) in string2:
            solutions.append(' '.join(test))
    solutions.sort(key=len, reverse=True)
    return solutions[0]

score 0 · Answer 2 · answered Dec 28 '17 at 12:15

I used this method and it worked:

def AnalyzeTwoExpr(expr1,  expr2): #Case sensitive
    commonExpr = []
    a = expr1.split(' ') #splits each expression into an array of words
    b = expr2.split(' ') #splits each expression into an array of words
    for word1 in a:
        for word2 in b:
            if(word1 == word2):
                commonExpr.append(word1)

return commonExpr

This method returns an array containing all the words that were included in both expressions. This method has 2 required arguments, 2 strings, which are the 2 expressions to analyze.

Also, have a not case sensitive method:

def AnalyzeTwoExpr(expr1,  expr2): #Not case sensitive
    commonExpr = []
    a = expr1.split(" ")
    b = expr2.split(" ")
    for word1 in a:
        for word2 in b:
            w1 = word1.lower()
            w2 = word2.lower()
            if(w1 == w2):
                commonExpr.append(w1)

return commonExpr

Hope this works for you.

Detect implicit substring patterns in python

2 Answers2