Suppose you have dataframedf
as below.
df
product_description
0 kanchivaram saree of red colour
1 Pink gujrati saree
2 Lehenga from Surat
3 Red swim suit
You will require list of words as which will contain in column product_type
as below.
lst= ['saree','Lehenga','swim suit']
Then lst
will iterate over each row in column product_description
and create product_type
column as below code.
Using Regex -Efficient for Big DataFrames, also it is case insensitive.
import pandas as pd
# initialize data of lists.
data = {'product_description': ['kanchivaram saree of red colour', 'Pink gujrati saree', 'Lehenga from Surat', 'Red swim suit'],}
# Create DataFrame
df = pd.DataFrame(data)
lst = ['saree','Lehenga','swim suit']
regex = re.compile(fr"\s*({'|'.join(re.escape(x) for x in lst)})", re.IGNORECASE)
df['product_type_using_regex'] = df['product_description'].str.extract(regex, '')
df
Alertnate Method-(Case Sensitive)
Complete code-
import pandas as pd
# initialize data of lists.
data = {'product_description': ['kanchivaram saree of red colour', 'Pink gujrati saree', 'Lehenga from Surat', 'Red swim suit'],}
# Create DataFrame
df = pd.DataFrame(data)
lst = ['saree','Lehenga','swim suit']
df['product_type'] = df['product_description'].apply(lambda x: ';'.join([m for m in lst if m in x])).replace('',np.nan)
df
Output-
product_description product_type
0 kanchivaram saree of red colour saree
1 Pink gujrati saree saree
2 Lehenga from Surat Lehenga
3 Red swim suit swim suit