How to extract common elements over two lists of strings - python

Question

I am trying to match two lists of strings with names that are written differently and have partial matches:

list1 = {'ADELA SARABIA', 'JUAN PEREZ', 'JOHN ADAMS', 'TOM HANKS'}
list2 = {'JOSE GARCIA', 'HANKS TOM', 'PEREZ LOPEZ JUAN', 'JOHN P. ADAMS'}

I want to keep the names that appear in both lists even though have only partial matches. Desire output:

matches = {'JUAN PEREZ', 'JOHN ADAMS', 'TOM HANKS'}

I was using this code frome another stackoverflow question, but doesnt work with my case:

lst = []
for i in list1:
    has_match = False
    for j in list2:
        if i.split()[0] in j:
            has_match = True
            print(i, j)
            if j not in lst:
                lst.append(j)
        if len(i) > 1:
            k = ' '.join(i.split()[:2])
            if k in j:
                has_match = True
                print(i, j)
                if j not in lst:
                    lst.append(j)
    if not has_match:
        lst.append(i + ' - not found')

You might need other special cases, like potentially ignoring a middle name or initial. The Levenshtein Distance Algorithm may or may not help too, depending on what sort of differences ypu might get — doctorlove, Aug 24 '23 at 13:27
Does this answer your question? [How to retrieve partial matches from a list of strings](https://stackoverflow.com/questions/64127075/how-to-retrieve-partial-matches-from-a-list-of-strings) — JeffUK, Aug 24 '23 at 13:27
@JeffUK thank you for your comment, but i cant use startswith with a list of strings — Daniela saba rosner, Aug 24 '23 at 13:57
The answer linked also includes an `in` option, note it uses a filter, so applies it to each element, not to the list of strings — JeffUK, Aug 24 '23 at 14:26

B Remmelzwaal · Answer 1 · 2023-08-24T14:21:38.983

0

My first idea is to split the names and use a set intersection to determine partial matches (assuming each name in a full name is unique):

list1 = {'ADELA SARABIA', 'JUAN PEREZ', 'JOHN ADAMS', 'TOM HANKS'}
list2 = {'JOSE GARCIA', 'HANKS TOM', 'PEREZ LOPEZ JUAN', 'JOHN P. ADAMS'}
matches = []

for name1 in list1:
    split1 = set(name1.split(' '))
    for name2 in list2:
        split2 = set(name2.split(' '))
        if split1.intersection(split2) == min(split1, split2, key=len):
            matches.append(name1)
            break

print(set(matches))

Output:

{'JUAN PEREZ', 'TOM HANKS', 'JOHN ADAMS'}

edited Aug 24 '23 at 14:21

answered Aug 24 '23 at 13:25

B Remmelzwaal

1,581
2
4
11

thank you for your answer but what if i have this list list1 = {'ADELA SARABIA', 'JUAN PEREZ', 'JOHN ADAMS', 'TOM HANKS HARDY'} the output removes the match of TOM HANKS, there is possible to add more separators (' ')? – Daniela saba rosner Aug 24 '23 at 13:45
@Danielasabarosner I assumed from your example that `list1` would always contain the shorter names. I have updated my answer accordingly. – B Remmelzwaal Aug 24 '23 at 14:22

score 0 · Answer 2 · answered Aug 24 '23 at 14:03

0

Use the below list comprehension to get your result:

[name for name in list1 if any([any([part_name in other_name for other_name in list2]) for part_name in name.split()])]

Output:

['JUAN PEREZ', 'TOM HANKS', 'JOHN ADAMS']

answered Aug 24 '23 at 14:03

Sai Srinivas

1
1

score 0 · Accepted Answer · answered Aug 24 '23 at 17:22

This work exactly as i expected

def calculate_similarity(string1, string2):
words1 = set(string1.split())
words2 = set(string2.split())
common_words = words1 & words2
similarity = len(common_words) / min(len(words1), len(words2))
return similarity

matches = set()

for item1 in list1:
    best_similarity = 0
    best_match = None

for item2 in list2:
    similarity = calculate_similarity(item1, item2)
    if similarity > best_similarity:
        best_similarity = similarity
        best_match = item2

if best_similarity > 0.7:  # Adjust the threshold as needed
    matches.add(best_match)

print("Matches:", matches)

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Aug 27 '23 at 16:20

How to extract common elements over two lists of strings - python

3 Answers3