How do I NOT use iterrows to solve my problem?

Question

I've been reading about how it is best practice to avoid using iterrows to iterate through a pandas DataFrame, but I am not sure how else I can solve my particular problem:

How can I:

Find the "time" of the first instance of the value "c" in one DataFrame, df1, grouped by "num" and sorted by "time"
Then add that "time" into a separate DataFrame, df2, based on "num".

Here is an example of my input DataFrame:

import pandas as pd

df = pd.DataFrame({'num': [2, 2, 2, 2, 5, 5, 5, 5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 
                           8, 8, 8, 8, 9, 9, 9, 9, 9], 
                   'state': ['a', 'b', 'c', 'b', 'a', 'b', 'c', 'b', 'c', 'b', 'c', 'a', 
                             'b', 'c', 'b', 'c', 'b', 'c', 'a', 'b', 'c', 'b', 'c', 'b', 
                             'c', 'b', 'c', 'b', 'c', 'b'],
                   'time': [234, 239, 244, 249, 100, 105, 110, 115, 120, 125, 130, 3, 8, 
                            13, 18, 23, 28, 33, 551, 556, 561, 566, 571, 576, 581, 45, 50, 
                            55, 60, 65]})

Expected output (df2):

Every solution I attempt seems like it would require iterrows to load the "time" into df2.

Well you don't really need to load anything into `df2`. You can get that from an aggregation of `df1` and then if you need to ensure specific rows always or never appear you would `reindex`. If you provide your data in a format people can easily copy and paste (or runable code) you're more likely to get help. It's just a lot of work to reproduce as is — ALollz, May 11 '21 at 22:27
Please take a look at [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), we ask that questions include a [mcve] with your sample input and expected output in the text of your question rather than as a picture or link — G. Anderson, May 11 '21 at 22:38

Arne · Accepted Answer · 2021-05-11T23:20:30.337

1

You can do it in one line, using df.groupby() with min() as the aggregation function:

df[df.state == 'c'].drop('state', axis=1).groupby('num').aggregate(min)

edited May 11 '21 at 23:20

answered May 11 '21 at 22:50

Arne

9,990
2
18
28

fthomson · Answer 2 · 2021-05-11T23:03:19.880

0

Its hard to check without re-creating the df but i think this should do it

def first_c(group):
    filtered = group[group['state'] == 'c'].iloc[0]
    return filtered[['num', 'time']]


df2 = df.groupby('num').apply(first_c)

Group by num
Apply function and filter for c, find first integer index with iloc
return num and time

edited May 11 '21 at 23:03

answered May 11 '21 at 22:33

fthomson

773
3
9

How do I NOT use iterrows to solve my problem?

2 Answers2