I have a csv
that looks like this:
screen_name,tweet,following,followers,is_retweet,bot
narutouz16,Grad school is lonely.,59,20,0,0
narutouz16,RT @GetMadz: Sound design in this game is 10/10 game freak lied. ,59,20,1,0
narutouz16,@hbthen3rd I know I don't.,59,20,0,0
narutouz16,"@TonyKelly95 I'm still not satisfied in the ending, even though its longer.",59,20,0,0
narutouz16,I'm currently in second place in my leaderboards in duolongo.,59,20,0,0
I am able to read this into a dataframe
using the following:
df = pd.read_csv("file.csv")
That works great. I get the following dimensions when I print(df.shape)
(1223726, 6)
I have a list of usernames, like below:
bad_names = ['BELOZEROVNIKIT', 'ALTMANBELINDA', '666STEVEROGERS', 'ALVA_MC_GHEE', 'CALIFRONIAREP', 'BECCYWILL', 'BOGDANOVAO2', 'ADELE_BROCK', 'ANN1EMCCONNELL', 'ARONHOLDEN8', 'BISHOLORINE', 'BLACKTIVISTSUS', 'ANGELITHSS', 'ANWARJAMIL22', 'BREMENBOTE', 'BEN_SAR_GENT', 'ASSUNCAOWALLAS', 'AHMADRADJAB', 'AN_N_GASTON', 'BLACK_ELEVATION', 'BERT_HENLEY', 'BLACKERTHEBERR5', 'ARTHCLAUDIA', 'ALBERTA_HAYNESS', 'ADRIANAMFTTT']
What I want to do is loop through the dataframe, and if the username
is in this list at all, to remove those rows from df
and add them to a new df
called bad_names_df
.
Pseudocode would look like:
for each row in df:
if row.username in bad_names:
bad_names_df.append(row)
df.remove(row)
else:
continue
My attempt:
for row, col in df.iterrows():
if row['username'] in bad_user_names:
new_df.append(row)
else:
continue
How is it possible to (efficiently) loop through df
, with over 1.2M rows, and if the username is in the bad_names
list, remove that row and add that row to a bad_names_df
? I have not found any other SO posts that address this issue.