0

So I have this code

for idx, item in df['product_code'].iteritems():  
    value = item in dj['product_numb'].values  
    if value == False:  
        df = df.drop(idx) 

And I have a csv with over 30k rows and 600+ columns; df.

Dataframe dj is a smaller DataFrame with the contents I want to match in df.

Main questions is, how do I make this take less than 3 hours to complete?

FlaskyPG
  • 1
  • 2
  • 1
    Your question is confusing. Please see [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and [edit] your question to include a [mcve] with sample input data and expected output so that we can understand your question better. – G. Anderson Dec 07 '21 at 00:23
  • This will be slow: `for idx, item in df['product_code'].iteritems():`, but this: `value = item in dj['product_numb'].value` will make it catastrophically slow – juanpa.arrivillaga Dec 07 '21 at 00:29
  • 1
    `df_filtered = df[df.product_code.isin(dj.product_numb)]` – anon01 Dec 07 '21 at 00:29
  • 2
    Can you write a title that helps other people _with the exact same technical problem_ find your question? A better example might be something like "How can I efficiently find intersections between two dataframes?", if in fact that's an accurate description of what you're trying to do. The goal in writing a title is for the next person with the same problem to be able to Google up your question and its answers, instead of needing to ask their own. – Charles Duffy Dec 07 '21 at 00:36
  • Yes, please read [ask] and the [help] for advice on how to write on-topic questions. The key thing to remember is that your question/title should be *helpful to other people*. – juanpa.arrivillaga Dec 07 '21 at 00:40

1 Answers1

1

Try this:

df = df[df['product_code'].isin(dj['product_numb'])]