Very new to Python here, and still am quite not fully understanding how to use Python correctly, so please bear with my stupidity here.
Let's say we have a dataframe like this:
samp_data = pd.DataFrame([[1,'hello there',3],
[4,'im just saying hello',6],
[7,'but sometimes i say bye',9],
[2,'random words here',5]],
columns=["a", "b", "c"])
print(samp_data)
a b c
0 1 hello there 3
1 4 im just saying hello 6
2 7 but sometimes i say bye 9
3 2 random words here 5
and we set a list of words we dont want:
unwanted_words = ['hello', 'random']
I want to write a function that will exclude all rows where column b contains any words in the "unwanted_words" list. So the output should be:
print(samp_data)
a b c
2 7 but sometimes i say bye 9
what i've tried so far include using the built in "isin()" function:
data = samp_data.ix[samp_data['b'].isin(unwanted_words),:]
but this does not exclude the rows as i expected; and I tried using the str.contains() function:
for i,row in samp_data.iterrows():
if unwanted_words.str.contains(row['b']).any():
print('found matching words')
and this would throw me errors.
i think i'm just complicating things and there must be some really easy way out there that I am not aware of. any help is greatly appreciated!
posts i read into so far (not limited to this list, as i closed many windows already):