I am trying to compare the strings btw two DataFrame columns. category_df['column_text_to_find'] contains string to match in other dataframe df2['column_text_to_search']. The new column df2['matched text'] should return the df['column_text_to_find'] found in df2['column_text_to_search']. my expected result is
['column_text_to_search'] ['column_text_to_find'] ['matched text']
'SP * GRAPHICSDIRECT.ascdadv' 'GRAPHICSDIRECT' 'GRAPHICSDIRECT'
'99 CENTS ONLY #777#' '99 CENTS ONLY' '99 CENTS ONLY'
'PAYPAL *BESTBUY COM #3422#' 'BESTBUY' 'BESTBUY'
Unfortunately, my code returns an error.
csv import:
for f in all_files:
df = pd.read_csv(f, sep=',',header=[3])
df2 = df
remove blank spaces:
df2['column_text_to_search']=df2['column_text_to_search'].str.strip()
search and match text:
ch = category_df['column_text_to_find']
pat = r'\b({0})\b'.format('|'.join(ch))
df2['matched text'] = df2['column_text_to_search'].str.findall(pat, flags =
re.IGNORECASE).map("_".join)
df2.head()
Error:
TypeError: sequence item 0: expected str instance, tuple found