3

I have two strings and I want to find all the common words. For example,

s1 = 'Today is a good day, it is a good idea to have a walk.'

s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'

Consider s1 matches s2

'Today is' matches 'today is' but 'Today is a' does not match any characters in s2. Therefore, 'Today is' is one of the common consecutive characters. Similarly, we have 'a good day', 'is', 'a good', 'have a walk'. So the common words are

common = ['today is', 'a good day', 'is', 'a good', 'have a walk']

Can we use regular expression to do that?

Thank you very much.

Frankie
  • 744
  • 1
  • 9
  • 14
  • 1
    are you looking for common words or common phrases? are you trying to avoid double counting matches as phrases such as "a good day" could be broken up into just "good" which would be evaluated again. – deko Aug 28 '17 at 03:04
  • Your criteria needs tightening: for instance `Today` in s1, and Yesterday` in s2 have `day` in common – Reblochon Masque Aug 28 '17 at 03:05

2 Answers2

4
import string
s1 = 'Today is a good day, it is a good idea to have a walk.'
s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
z=[]
s1=s1.translate(None, string.punctuation) #remove punctuation
s2=s2.translate(None, string.punctuation)
print s1
print s2
sw1=s1.lower().split()                   #split it into words
sw2=s2.lower().split()
print sw1,sw2
i=0
while i<len(sw1):          #two loops to detect common strings. used while so as to change value of i in the loop itself
    x=0
    r=""
    d=i
    #print r
    for j in range(len(sw2)):
        #print r
        if sw1[i]==sw2[j]:
            r=r+' '+sw2[j]                       #if string same keep adding to a variable
            x+=1
            i+=1
        else:
            if x>0:     # if not same check if there is already one in buffer and add it to result (here z)
                z.append(r)
                i=d
                r=""
                x=0
    if x>0:                                            #end case of above loop
        z.append(r)
        r=""
        i=d
        x=0
    i+=1 
    #print i
print list(set(z)) 

#O(n^3)
akp
  • 619
  • 5
  • 12
2

Took reference from Find common substring between two strings

Modified few lines and added few lines Modification is default return of answer = "NULL" if not found any substring .

Added keep on searching until you get NULL and store to List

def longestSubstringFinder(string1, string2):
    answer = "NULL"
    len1, len2 = len(string1), len(string2)
    for i in range(len1):
        match = ""
        for j in range(len2):
            if (i + j < len1 and string1[i + j] == string2[j]):
                match += string2[j]
            else:
                if (len(match) > len(answer)): answer = match
                match = ""
    return answer


mylist = []

def call():
    s1 = 'Today is a good day, it is a good idea to have a walk.'

    s2 = 'Yesterday was not a good day, but today is good, shall we have a walk?'
    s1 =  s1.lower()
    s2 = s2.lower()
    x = longestSubstringFinder(s2,s1)
    while(longestSubstringFinder(s2,s1) != "NULL"): 
        x = longestSubstringFinder(s2,s1)
        print(x)
        mylist.append(x)
        s2 = s2.replace(x,' ')

call()
print ('[%s]' % ','.join(map(str, mylist)))

Output

[ a good day, , have a walk,today is , good]

Difference in your output

common = ['today is', 'a good day', 'is', 'a good', 'have a walk']

Your expectation for second "is" wrong as you see in s2 there is only one "is"

Hariom Singh
  • 3,512
  • 6
  • 28
  • 52
  • Thank you, Hariom Singh, you are correct. – Frankie Aug 28 '17 at 04:30
  • Program is not working for mentioned input: s1 = 'Today is a good day, it is a good idea to have a walk.',s2 = 'Yesterday was not a good day, but today is a good day, shall we have a walk?' – Poonam Aug 28 '17 at 04:47
  • 1
    @Poonam it works perfectly did you execute the call() function? – Hariom Singh Sep 01 '17 at 04:28
  • @Poonam https://trinket.io/python/85d3dafec4 – Hariom Singh Sep 01 '17 at 04:30
  • @https://stackoverflow.com/users/7590993/hariom-singh - Yeah, I thought recurrence of same is not allowed. Like if once I get "Today is a good day" as a longest string, "a good" should not repeat. But as per question your logic is working fine. – Poonam Sep 01 '17 at 04:45
  • @Poonam Its starts with the longest and the shortest ..few modification will lead to your goal – Hariom Singh Sep 01 '17 at 04:56