I have this little program, but it runs so slow

Question

So I have this code

for idx, item in df['product_code'].iteritems():  
    value = item in dj['product_numb'].values  
    if value == False:  
        df = df.drop(idx)

And I have a csv with over 30k rows and 600+ columns; df.

Dataframe dj is a smaller DataFrame with the contents I want to match in df.

Main questions is, how do I make this take less than 3 hours to complete?

Your question is confusing. Please see [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to include a [mcve] with sample input data and expected output so that we can understand your question better. — G. Anderson, Dec 07 '21 at 00:23
This will be slow: `for idx, item in df['product_code'].iteritems():`, but this: `value = item in dj['product_numb'].value` will make it catastrophically slow — juanpa.arrivillaga, Dec 07 '21 at 00:29
Can you write a title that helps other people _with the exact same technical problem_ find your question? A better example might be something like "How can I efficiently find intersections between two dataframes?", if in fact that's an accurate description of what you're trying to do. The goal in writing a title is for the next person with the same problem to be able to Google up your question and its answers, instead of needing to ask their own. — Charles Duffy, Dec 07 '21 at 00:36
Yes, please read [ask] and the [help] for advice on how to write on-topic questions. The key thing to remember is that your question/title should be *helpful to other people*. — juanpa.arrivillaga, Dec 07 '21 at 00:40

score 1 · Answer 1 · answered Dec 07 '21 at 00:29

1

Try this:

df = df[df['product_code'].isin(dj['product_numb'])]

answered Dec 07 '21 at 00:29

1 Answers1