0

I am looking for a way to get rows of a data frame based on the min value of a specific column in a group by operation.

As answered on this question displays a perfect example and also includes a working answer.

However, the operation is very computationally expensive. It might work on simple datasets, but on large data sets, it will be a burden and take a very long time to run.

When running it on a SQL query, it is possible to use the ROW_NUMBER function to filter and get the min value based on the row number as shown here. It seems to be much faster, but what to do when I already have a pandas dataframe?

I reckon there might be a cheaper way to execute this operation.

Thanks, everyone.

  • kindly provide sample data with expected output dataframe. you can also share your timings when you use idxmin – sammywemmy Sep 23 '21 at 23:07
  • @sammywemmy, thanks for your answer! Since it was about a large dataset I found it hard to provide sample data. Anyway, I think I found a very good workaround for this problem. Just posted it below!! Thanks – Sergio Polimante Sep 23 '21 at 23:30
  • The question is not the same. The question in the previous post just asks how to do it. the question and answer here are how to do it in an optimal way. I don't think the answer here should be deleted since it is the right answer to the post. So I think the previous post should have a link to this one which is the specific question. What do you think? – Sergio Polimante Sep 24 '21 at 13:32

0 Answers0