I have a pandas dataframe with multiple columns and a dict with key and values as a lists. in the df one column represents a description, I need to look at this description and check if it matches one of the values in the list of the dict.
This is an extract from the dict:
clothing_types = {'T-Shirt': ['t-shirt', 'shirt', 'tee'],
'Tank Top': ['tank top', 'mesh', 'top', 'tank'],
'Socks': ['socks'],
'Hat': ['cap'],
'Trainers': ['trainers', 'snickers', 'shoes', 'furylite
contemporary'}
This is the column:
0 UNDER ARMOUR LADIES FLY-BY STRETCH MESH TANK TOP
1 UNDER ARMOUR LADIES SPEEDFORM NO SHOW SOCKS
2 UNDER ARMOUR LADIES SPEEDFORM NO SHOW SOCKS
3 UNDER ARMOUR LADIES PLAY UP SHORTS
4 REEBOK LADIES CLASSIC LEATHER MID TRAINERS
5 UNDER ARMOUR MENS Spring Performance Oxford SHIRT
6 UNDER ARMOUR LADIES HEATGEAR ALPHA SHORTY SHORTS
7 ADIDAS LADIES PRO TANK
8 REEBOK LADIES ONE SERIES V NECK T-SHIRT
9 REEBOK LADIES DF LONG BRA
10 NIKE LADIES BASELINE TENNIS SKIRT
11 UNDER ARMOUR MENS ESCAPE 7" SOLID SHORTS
12 UNDER ARMOUR LADIES FLY-BY STRETCH MESH TANK TOP
I can do the comparison through the normal for loops:
for item in self.original_file['Product Description'].tolist():
found = False
for item_type, type_descriptions in clothing_types.items():
for description in type_descriptions:
if description.upper() in item.upper():
# print(item_type, item)
found = True
break
if not found:
print('NOT FOUND', item)
And have tried to do it with the np.where:
for item_type, type_descriptions in clothing_types.items():
for description in type_descriptions:
self.original_file['Category'] = np.where(description.upper() in self.original_file['Product Description'], item_type, 'None')
but it replaces the values with the last value comparison which makes the column value always None
The expectation is that if the let say "SHIRT" is in the description "T-Shirt" (which is a key of the dict) will be populated in the new column - Category