fuzzy match between 2 columns (Python)

Question

I have a pandas dataframe called "df_combo" which contains columns "worker_id", "url_entrance", "company_name". I am trying to produce an output column that would tell me if the URLs in "url_entrance" column contains any word in "company_name" column. Even a close match like fuzzywuzzy would work.

For example, if the URL is "www.grandhotelseattle.com" and the "company_name" is "Hotel Prestige Seattle", then the fuzz ratio might be somewhere 70-80.

I have tried the following script: >>>fuzz.ratio(df_combo['url_entrance'],df_combo['company_name']) but it returns only 1 number which is the overall fuzz ratio for the whole column. I would like to have fuzz ratio for every row and store those ratios in a new column.

Here is possibly linked question on SO [create new column in dataframe using fuzzywuzzy](http://stackoverflow.com/questions/36138886/create-new-column-in-dataframe-using-fuzzywuzzy) — agg3l, Oct 20 '16 at 00:45
Not exactly but related. Here the resulting table will have the length of the original table square. (That's a dimension more than a column...) — kpie, Oct 20 '16 at 02:12
@agg3l, I checked that link. Ran the scripts, but got an error saying: "TypeError: ("object of type 'float' has no len()", u'occurred at index 3206')". — Stanleyrr, Oct 20 '16 at 02:41
Share df_combo.head() so that we visualize better your df and your issue — Zeugma, Oct 20 '16 at 04:39

score 4 · Answer 1 · answered Oct 20 '16 at 20:09

4

Thanks everyone for your inputs. I have solved my problem! The link that "agg3l" provided was helpful. The "TypeError" I saw was because either the "url_entrance" or "company_name" has some floating types in certain rows. I converted both columns to string using the following scripts, re-ran the fuzz.ratio script and got it to work!

df_combo['url_entrance']=df_combo['url_entrance'].astype(str) df_combo['company_name']=df_combo['company_name'].astype(str)

answered Oct 20 '16 at 20:09

Stanleyrr

858
3
12
31

This was helpful. In my case, I had some NaN values, so a simple `df['a'].fillna(' ', inplace=True)` before running the fuzzy matched worked. – cyril Apr 26 '17 at 20:43

fuzzy match between 2 columns (Python)

1 Answers1

Linked