I have a dataframe such as:
import pandas as pd
import re
df = pd.DataFrame({"Name": ["D1", "D2", "D3", "D4", "M1", "M2", "M3"],
"Requirements": ["3 meters|2/3 meters|3.5 meters",
"3 meters",
"3/5 meters|3 meters",
"2/3 meters",
"steel|g1_steel",
"steel",
"g1_steel"]})
dataframe df
Name Requirements
0 D1 3 meters|2/3 meters|3.5 meters
1 D2 3 meters
2 D3 3/5 meters|3 meters
3 D4 2/3 meters
4 M1 steel|g1_steel
5 M2 steel
6 M3 g1_steel
I have a list of words req_list = ['3 meters', 'steel']
and I am trying to extract rows from df
where the strings in column Requirements
contain standalone words that are from req_list
. This is what I have done:
This one prints just D2 and M2
df[df.Requirements.apply(lambda x: any(len(x.replace(y, '')) == 0 for y in req_list))]
This one prints all rows
df[df['Requirements'].str.contains(fr"\b(?:{'|'.join(req_list)})\b")]
My desired result is as follows:
Name Requirements
0 D1 3 meters|2/3 meters|3.5 meters
1 D2 3 meters
2 D3 3/5 meters|3 meters
4 M1 steel|g1_steel
5 M2 steel
In this desired output, D4 and M3 are eliminated because they do not have words from req_list
as standalone strings. Is there any way to achieve this preferably in an one-liner without using custom functions?
EDIT
The strings in the column Requirements
can come in any pattern such as:
Name Requirements
0 D1 3 meters|2/3 meters|3.5 meters
1 D2 3 meters
2 D3 3/5 meters|3 meters
3 D4 2/3 meters
4 D5 3::3 meters # New pattern which needs to be eliminated
5 D6 3.3 meters # New pattern which needs to be eliminated
6 D7 3?3 meters # New pattern which needs to be eliminated
7 M1 steel|g1_steel
8 M2 steel
9 M3 g1_steel