Matching strings in Pandas

Question

I have two dataframes named codes and phrases

codes :

code	keywords
bg	burger
bg	burgers
cbg	chicken burger
cbg	burger chicken
cbg	chicken burgers
--	--
--	--

phrases :

text
burgers near me
chicken burgers around NYC
--
--

Using python I want to build a dataframe like this :

text	code
burgers near me	bg
chicken burgers around NYC	cbg
--	--
--	--

I am trying to identify which keywords from codes best match with each record of phrases.

If I simply use string contains function, burgers would match with both the phrases above. Is there a better way to accomplish this?

Thanks in advance!

Could [this](https://stackoverflow.com/a/56315491/3275464) help? — Learning is a mess, Jun 07 '23 at 12:38
You could sort "codes" by length of keywords and process it in this order. — Michael Butscher, Jun 07 '23 at 12:42

score 0 · Accepted Answer · answered Jun 07 '23 at 12:50

You can add a column to codes with the length of each keyword. Then start assigning the largest number of characters first. With each iteration, calculate a new index to find the remaining blanks and the matches so that only those are filled.

phrases['code'] = ''
codes['Length'] = codes.keywords.str.len()
codes = codes.sort_values('Length', ascending=False)

for _, row in codes.iterrows()
    ix_blank = phrases.code.eq('')
    ix_match = phrases.text.str.contains(f'\\b{row.keywords}\\b')
    phrases.loc[ix_blank & ix_match, 'code'] = row.code

Matching strings in Pandas

1 Answers1