I have some scraped data from an ecommerce website and that has the package unit count in the name (see example below). I want to take the unit count information from the name and add the number of units as a int into a "Unit" column. I know I can use df.loc[product_column].str.contains('10 pk'), unitColumn] = 10
, or even loop through a list that holds sample strings of each unit count. But it becomes a little more cumbersome when your looking at data from 150 stores.
What I'm looking for is a way for python to figure it out automatically and change set the value for with ML. I don't know if it's possible but I'm hoping someone can point me in then right direction.
Product name 10 pack
flavor 10 Pack product name
10 pack name
product name flavor 10pk
flavor product name 14Pk
1gx14 Product name
name 2-Pack
2pk different name
store name 3 pack
product name 5 pack store name
name 5 pack
5 Pack product name
Name 5-pack product name
randome name 5pk product name
name 5pk flavor
6 Pack flavor
flavor 7 pack
prduct name 7 pack
7pk flavor
pack x2 name
I know I can do
df.loc[product_column].str.contains('10 pk'), unitColumn] = 10
But I'm looking for more of an automated solution.