I am trying to match two lists of strings with names that are written differently and have partial matches:
list1 = {'ADELA SARABIA', 'JUAN PEREZ', 'JOHN ADAMS', 'TOM HANKS'}
list2 = {'JOSE GARCIA', 'HANKS TOM', 'PEREZ LOPEZ JUAN', 'JOHN P. ADAMS'}
I want to keep the names that appear in both lists even though have only partial matches. Desire output:
matches = {'JUAN PEREZ', 'JOHN ADAMS', 'TOM HANKS'}
I was using this code frome another stackoverflow question, but doesnt work with my case:
lst = []
for i in list1:
has_match = False
for j in list2:
if i.split()[0] in j:
has_match = True
print(i, j)
if j not in lst:
lst.append(j)
if len(i) > 1:
k = ' '.join(i.split()[:2])
if k in j:
has_match = True
print(i, j)
if j not in lst:
lst.append(j)
if not has_match:
lst.append(i + ' - not found')