-2

How do I compare several rows and find words/combination of words that are present in each row? Using pure python, nltk or anything else.

few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
# some magic
result = 'foo bar'
stkvtflw
  • 12,092
  • 26
  • 78
  • 155

4 Answers4

3

Split each string at whitespaces and save the resulting words into sets. Then, compute the intersection of the three sets:

few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
sets = [set(s.split()) for s in few_strings]
common_words = sets[0].intersection(*sets[1:])
print(common_words)

Output:

{'bar', 'foo'}
Flursch
  • 483
  • 2
  • 4
1

You might want to use the standard library difflib for sequence comparisons including finding common substrings:

from difflib import SequenceMatcher

list_of_str = ['this is foo bar', 'this is not a foo bar', 'some other foo bar here']

result = list_of_str[0]
for next_string in list_of_str:
    match = SequenceMatcher(None, result, next_string).find_longest_match()
    result = result[match.a:match.a + match.size]

# result be 'foo bar'
from difflib import SequenceMatcher

string1 = "apple pie available"
string2 = "come have some apple pies"

match = SequenceMatcher(None, string1, string2).find_longest_match()

print(match)  # -> Match(a=0, b=15, size=9)
print(string1[match.a:match.a + match.size])  # -> apple pie
print(string2[match.b:match.b + match.size])  # -> apple pie
Mark
  • 336
  • 1
  • 7
0
few_strings = ('this is foo bar', 'this is not a foo bar', 'some other foo bar here')
  1. Create sets of words for each sentence splitting by space (" ")
  2. Add the first string to results
  3. Loop over the sentences and update result variable with the interesction of the current result and one sentence
# 1.
sets = [set(s.split(" ")) for s in few_strings]
# 2.
result = sets[0]
# 3.
for i in range(len(sets)):
    result = result.intersection(sets[i])

Now you have a Python Set of words which occured in all sentences. You can convert the set to list with:

result = list(result)

or to string with

result = " ".join(result)
lutrarutra
  • 180
  • 1
  • 10
0

You can do it without using libraries too

few_strings = ('this is foo bar', 'some other foo bar here', 'this is not a foo bar')
strings = [s.split() for s in few_strings]
strings.sort(key=len)
print(strings)
result = ''

for word in strings[0]:
    count = 0
    for string in strings:
        if word not in string:
            break
        else:
            count += 1
    if count == len(strings):
        result += word + ' '

print(result)
Sidharth Mudgil
  • 1,293
  • 8
  • 25