I have a huge set of data. Something like 100k lines and I am trying to drop a row from a dataframe if the row, which contains a list, contains a value from another dataframe. Here's a small time example.
has = [['@a'], ['@b'], ['#c, #d, #e, #f'], ['@g']]
use = [1,2,3,5]
z = ['#d','@a']
df = pd.DataFrame({'user': use, 'tweet': has})
df2 = pd.DataFrame({'z': z})
tweet user
0 [@a] 1
1 [@b] 2
2 [#c, #d, #e, #f] 3
3 [@g] 5
z
0 #d
1 @a
The desired outcome would be
tweet user
0 [@b] 2
1 [@g] 5
Things i've tried
#this seems to work for dropping @a but not #d
for a in range(df.tweet.size):
for search in df2.z:
if search in df.loc[a].tweet:
df.drop(a)
#this works for my small scale example but throws an error on my big data
df['tweet'] = df.tweet.apply(', '.join)
test = df[~df.tweet.str.contains('|'.join(df2['z'].astype(str)))]
#the error being "unterminated character set at position 1343770"
#i went to check what was on that line and it returned this
basket.iloc[1343770]
user_id 17060480
tweet [#IfTheyWereBlackOrBrownPeople, #WTF]
Name: 4612505, dtype: object
Any help would be greatly appreciated.