I have a data frame containing transactions from a financial institution. One of the columns ['vendor_full'] is the vendor, but it could contain store numbers, physical location, etc. making it more difficult to summarize based who the vendor actually is.
I have created a dictionary where the key is the vendor name as it may appear in the data frame (or at least a part of the column string) and the value is the vendor name as I would like it to be written to a new column ['vendor_short'].
Based on this question and the answer from @Vaishali I got very close to the solution, but the difference is the user who posted the question above wanted to use the dictionary value as both the search term and the returned value. I would like to search for the key and return the value.
import pandas as pd
data = {'amount': [100, 150, 5, 89, 55, 14], 'vendor_full': ['store_name 1234', 'online_store xx55', 'st_name 9876', 'grocery_store', 'online_shop', 'clothing_store xx']}
cols = ['amount', 'vendor_full']
df = pd.DataFrame(data,columns = cols)
vendor_dict = {'store_name': 'store_name', 'online_store': 'online_store', 'st_name': 'store_name', 'grocery_store': 'grocery_store', 'online_shop': 'online_store', 'clothing_store': 'clothing_store'}
pat = r'({})'.format('|'.join(vendor_dict.values()))
cond = df['vendor_full'].str.contains('|'.join(vendor_dict.keys()))
df.loc[cond, 'vendor_short'] = df['vendor_full'].str.extract((pat), expand=False)
The code above seems to work for the first occurrence of a vendor, but I'm getting NaN for the remaining occurrences.
Actual:
amount vendor_full vendor_short
0 100 store_name 1234 store_name
1 150 online_store xx55 online_store
2 5 st_name 9876 NaN
3 89 grocery_store grocery_store
4 55 online_shop NaN
5 14 clothing_store xx clothing_store
Expected/desired:
amount vendor_full vendor_short
0 100 store_name 1234 store_name
1 150 online_store xx55 online_store
2 5 st_name 9876 store_name
3 89 grocery_store grocery_store
4 55 online_shop online_store
5 14 clothing_store xx clothing_store