Remove items from dataframe faster

Question

I have a fairly large data frame from which I need to remove values. I currently use this code:

for sha in shas:
        df = df[~df['SHA256'].str.contains(sha, regex=False)]

However, this doesn't scale well if shas gets sufficiently large. Is there a more efficient and faster way to drop elements from a dataframe?

How about `df[~df['SHA256'].str.contains('|'.join(shas), regex=True)]`? — Chris, Aug 06 '19 at 08:50

null · Accepted Answer · 2019-08-06T10:35:02.867

1

You may want to use isin() method rather than looping through.

df = df[~df['SHA256'].isin(shas)]

Edit: This solution only applies with values having an exact match. If you want a solution for values containing some other value, than check this solution

edited Aug 06 '19 at 10:35

answered Aug 06 '19 at 09:12

null

1,944
1
14
24

Remove items from dataframe faster

1 Answers1