0

I have a dirty dataframe d (top one) and reference dataframe r consists of following fields

enter image description here

d dataframe have misspelled author names, publishers, book names sometimes only one field is misspelled sometimes multiple fields. I want to match d rows with r rows with a similarity confidence level with a new row which book name , author and publisher must be similar. Which NLP method should I need to use? Can you give a hint? I am new to data science and ML.

geekmangnu
  • 83
  • 1
  • 7
  • 1
    probably best to do a fuzzy match on the`book_name` can you provide your data as text [mcve] as well as what youv'e tried ? others will surely be able to help you then. – Umar.H Dec 31 '19 at 13:18
  • fuzzy match is the key word I was searching I am looking FuzzyWuzzy python package now. I hope I will post my own answer when it is ready. Thank you. – geekmangnu Jan 01 '20 at 19:21
  • Does this answer your question? [How to compare a value in one dataframe to a column in another using fuzzywuzzy ratio](https://stackoverflow.com/questions/59312265/how-to-compare-a-value-in-one-dataframe-to-a-column-in-another-using-fuzzywuzzy) – SchwarzeHuhn Jan 07 '20 at 21:53

0 Answers0