get first and last values in a groupby

Question

I have a dataframe df

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd'],
                   ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']],
                  ['X', 'Y'])

How do I get the first and last rows, grouped by the first level of the index?

I tried

df.groupby(level=0).agg(['first', 'last']).stack()

and got

          X   Y
a first   0   1
  last    6   7
b first   8   9
  last   12  13
c first  14  15
  last   16  17
d first  18  19
  last   18  19

This is so close to what I want. How can I preserve the level 1 index and get this instead:

      X   Y
a a   0   1
  d   6   7
b e   8   9
  g  12  13
c h  14  15
  i  16  17
d j  18  19
  j  18  19

score 37 · Accepted Answer · edited Jun 20 '20 at 09:12

Option 1

def first_last(df):
    return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)

Option 2 - only works if index is unique

idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
df.loc[idx]

Option 3 - per notes below, this only makes sense when there are no NAs

I also abused the agg function. The code below works, but is far uglier.

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
    .set_index('level_1', append=True).reset_index(1, drop=True) \
    .rename_axis([None, None])

Note

per @unutbu: agg(['first', 'last']) take the firs non-na values.

I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

Let's include another test

df = pd.DataFrame(np.arange(20).reshape(10, -1),
                  [list('aaaabbbccd'),
                   list('abcdefghij')],
                  list('XY'))

df.loc[tuple('aa'), 'X'] = np.nan

def first_last(df):
    return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
    .set_index('level_1', append=True).reset_index(1, drop=True) \
    .rename_axis([None, None])

Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

Even though it looks more complicated, the `reset_index/agg` solution is significantly faster than the `groupby/apply` solution when there are a lot of groups. For example, when `df = pd.DataFrame(np.random.randint(100, size=(10**3,4)), columns=list('ABCD')).set_index(['A','B']).rename_axis([None, None])`. — unutbu, Aug 05 '16 at 20:34
Nice solutions! Might also be good to note that `agg(['first', 'last'])` returns the first and last non-NaN value if available. `apply(first_last)` will return the first and last values even if they are NaN. — unutbu, Aug 05 '16 at 21:09
i think for latest pandas version we need df.iloc instead of df.ix see e.g. https://stackoverflow.com/questions/59991397/attributeerror-dataframe-object-has-no-attribute-ix — Richard DiSalvo, Jul 27 '20 at 22:02

score 32 · Answer 2 · answered Aug 08 '18 at 15:31

32

This could be on of the easy solution.

df.groupby(level = 0, as_index= False).nth([0,-1])

      X   Y
a a   0   1
  d   6   7
b e   8   9
  g  12  13
c h  14  15
  i  16  17
d j  18  19

Hope this helps. (Y)

answered Aug 08 '18 at 15:31

Akarsh Jain

930
10
15

1

Helped a lot! :-) – GregOliveira Jul 03 '22 at 17:19

score 22 · Answer 3 · edited Jun 24 '19 at 02:01

22

Please try this:

For last value: df.groupby('Column_name').nth(-1),

For first value: df.groupby('Column_name').nth(0)

edited Jun 24 '19 at 02:01

M.Qasim

1,827
4
33
58

answered Jun 23 '19 at 23:30

nat23dip

329
2
2

1

But simultaneously? – jtlz2 Mar 18 '22 at 14:52

get first and last values in a groupby

3 Answers3

Option 1

Option 2 - only works if index is unique

Option 3 - per notes below, this only makes sense when there are no NAs

Note

Linked