0

I would like to find common string between: strings_list = ['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test']

The following code returns only the first part "PS1 1", I would imagine the result is "PS1 Test". Could you help me, is it possible to obtain using SequenceMatcher? Thank you in advance!

def findCommonStr(strings_list: list) -> str:

        common_str = strings_list[0]

        for i in range(1, n):
            match = SequenceMatcher(None, common_str, strings_list[i]).get_matching_blocks()[0]      
            common_str = common_str[match.b: match.b + match.size]

        common_str = common_str.strip()

        return common_str
Kroshka Kartoshka
  • 1,035
  • 5
  • 23
Elka
  • 3
  • 2
  • For ```['PS1 123456 Test', 'PS1 Test 454']``` the answer would still be ```PS1 Test```, correct? – Abhinav Mathur Oct 29 '20 at 16:33
  • `common = set.intersection(*map(set, map(str.split, strings_list)))`. – ekhumoro Oct 29 '20 at 17:53
  • (ping) Could you please help stackoverflow mechanics by accepting the answer you liked the most (if there is one indeed) so that authors of answers don't see this question in their active list ;) thank you for participation. If none of answers was relevant pls ignore this ping. – Kroshka Kartoshka Oct 30 '20 at 11:45

2 Answers2

0

You need to keep all the fragments, not only the first one:

def get_common_str(strs: List[str]) -> str:
    common_str = strs[0] if strs else ''

    for str_ in strs[1:]:
        common_str = ''.join(
            common_str[m.a:m.a + m.size]
            for m in SequenceMatcher(None, common_str, str_).get_matching_blocks()
        )

    return common_str


print(get_common_str(['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test']))

which gives

PS1 2 Test

This problem is tricky so this heuristic might not always work, feel free to come up with another one! Looks like SequenceMatcher did a good job in your case though. We got not only the common words but the word fragments too, quite impressive.

Kroshka Kartoshka
  • 1,035
  • 5
  • 23
0

This is without SequenceMatcher approach. If all strings follow the same pattern, you can split them into words on whitespaces.

strings_list = ['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test']

test = []
for string in strings_list:
  print(string.split())
  test.append(string.split())

>>> ['PS1', '123456', 'Test']
['PS1', '758922', 'Test']
['PS1', '978242', 'Test']

Now you can simply do a set intersection to find the common elements. Reference: Python -Intersection of multiple lists?

set(test[0]).intersection(*test[1:])

>>> {'PS1', 'Test'}

# join them to get string
' '.join(set(test[0]).intersection(*test[1:]))

>>> PS1 Test

This would only work if they follow this pattern of separated by white space.

Function:

def findCommonStr(strings_list: list) -> str:

  all_str = []
  for string in strings_list:
    
    all_str.append(string.split())

  return ' '.join(set(all_str[0]).intersection(*all_str[1:]))
ps_ps_ps
  • 28
  • 6