Asking a follow up question to my question here: Remove substring from string if substring in list in data frame column
I have the following data frame df1
string lists
0 I HAVE A PET DOG ['fox', 'pet dog', 'cat']
1 there is a cat ['dog', 'house', 'car']
2 hello EVERYONE ['hi', 'hello', 'everyone']
3 hi my name is Joe ['name', 'was', 'is Joe']
I'm trying to return a data frame df2
that looks like this
string lists new_string
0 I HAVE A PET DOG ['fox', 'pet dog', 'cat'] I HAVE A
1 there is a cat ['dog', 'house', 'car'] there is a cat
2 hello everyone ['hi', 'hello', 'everyone']
3 hi my name is Joe ['name', 'was', 'is Joe'] hi my
The solution I was using does not work for cases where a substring is multiple words, such as pet dog
or is Joe
df['new_string'] = df['string'].apply(lambda x: ' '.join([word for word in x.split() if word.lower() not in df['lists'][df['string'] == x].values[0]]))