Why doesn't first and last in a groupby give me first and last

Question

I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented.

Consider the dataframe df

df = pd.DataFrame(dict(
    A=list('xxxyyy'),
    B=[np.nan, 1, 2, 3, 4, np.nan]
))

   A    B
0  x  NaN
1  x  1.0
2  x  2.0
3  y  3.0
4  y  4.0
5  y  NaN

I wanted to get the first and last rows of each group defined by column 'A'.

I tried

df.groupby('A').B.agg(['first', 'last'])

   first  last
A             
x    1.0   2.0
y    3.0   4.0

However, This doesn't give me the np.NaNs that I expected.

How do I get the actual first and last values in each group?

Just notice `df.groupby('A').B.agg(['idxmax', 'idxmin'])` keep the same result with `first()` and `last()` — BENY, Aug 18 '17 at 02:33

score 7 · Answer 1 · answered Aug 17 '17 at 20:55

As noted here by @unutbu:

The groupby.first and groupby.last methods return the first and last non-null values respectively.

To get the actual first and last values, do:

def h(x):
    return x.values[0]

def t(x):
    return x.values[-1]

df.groupby('A').B.agg([h, t])

     h    t
A          
x  NaN  2.0
y  3.0  NaN

juanpa.arrivillaga · Accepted Answer · 2017-08-17T21:24:25.073

6

One option is to use the .nth method:

>>> gb = df.groupby('A')
>>> gb.nth(0)
     B
A
x  NaN
y  3.0
>>> gb.nth(-1)
     B
A
x  2.0
y  NaN
>>>

However, I haven't found a way to aggregate them neatly. Of course, one can always use a pd.DataFrame constructor:

>>> pd.DataFrame({'first':gb.B.nth(0), 'last':gb.B.nth(-1)})
   first  last
A
x    NaN   2.0
y    3.0   NaN

Note: I explicitly used the gb.B attribute, or else you have to use .squeeze

edited Aug 17 '17 at 21:24

answered Aug 17 '17 at 21:18

juanpa.arrivillaga

88,713
10
131
172

didn't know about `nth` – piRSquared Aug 17 '17 at 21:19
@piRSquared yep, it's a bit obscure for sure. – juanpa.arrivillaga Aug 17 '17 at 21:23
`(lambda g: pd.DataFrame(dict(first=g.nth(0), last=g.nth(-1))))(df.groupby('A').B)` – piRSquared Aug 17 '17 at 21:43
Even add a `min` and `max` to it (-:... `(lambda g: g.agg(['min', 'max']).join(pd.DataFrame(dict(first=g.nth(0), last=g.nth(-1)))))(df.groupby('A').B)` – piRSquared Aug 17 '17 at 21:45

Why doesn't first and last in a groupby give me first and last

2 Answers2

Linked