0

I have a list of Project Names which i have tried to clean up but they contain duplicates with minor mismatch. I want to find their nearest match and replace all occurrences with this match.

I am using Python and Pandas and have a imported a file which has a column inside which Project names are embedded. I did some cleaning and removed extra characters to extract the Project Names. but some names are occurring with minor mismatch. I difflib to find closest match but it returns two values and the best match is itself.

      Project Name  
552   Hilton International
553   Hilton International A

key = df2.iloc[552:553]['Project Name'].tolist()
key = key[0]
difflib.get_close_matches(key, df2['Project Name'].tolist())

expected result:

      Project Name  
552   Hilton International
553   Hilton International
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
AdnanTC
  • 1
  • 1

0 Answers0