Python pandas get first and last index, duplicate if first is also the last, of group in data frame

Question

I am working on getting the index of the first and last occurrence of IDs in a data frame. But if the ID only appears once, then the last occurrence will be the same as the first one.

For example, a data like this:

ID  Date
A   1/1/2015
A   1/5/2016
A   1/3/2017
B   1/3/2017
C   1/5/2016
C   1/7/2016

and the output will be

ID  Index   Date
A   0   1/1/2015
A   2   1/3/2017
B   3   1/3/2017
B   3   1/3/2017
C   4   1/5/2016
C   5   1/5/2016

Note: I don't really need the index, it is just for making the question clearer.

I have tried using data.groupby('ID', as_index=False).nth([0,-1]) but in the example above, this will only output B once.

Thanks in advance

piRSquared · Accepted Answer · 2018-05-24T03:45:55.390

2

`pd.concat`

pd.concat([d.iloc[[0, -1]] for _, d in df.groupby('ID')])

  ID      Date
0  A  1/1/2015
2  A  1/3/2017
3  B  1/3/2017
3  B  1/3/2017
4  C  1/5/2016
5  C  1/7/2016

Using `agg`

df.groupby('ID').agg(['first', 'last']).stack().reset_index('ID')

      ID      Date
first  A  1/1/2015
last   A  1/3/2017
first  B  1/3/2017
last   B  1/3/2017
first  C  1/5/2016
last   C  1/7/2016

edited May 24 '18 at 03:45

answered May 24 '18 at 03:29

piRSquared

285,575
57
475
624

Thanks! I tried the code and it works. However, I have found one funny behavior with `agg`, if for example (using the same example above) the Date in the last occurence of C is NaN, `agg` will simply print the first Date of C (1/5/2016), is there a way to keep the NaN? – user2552108 May 24 '18 at 03:40
Ahh! That is the intended behavior of `'first'` and `'last'` in an `agg` context. This is very similar to a question of mine https://stackoverflow.com/q/45744800/2336654 – piRSquared May 24 '18 at 03:44
See updated answer. I like the new answer better for your objective. – piRSquared May 24 '18 at 03:46
I like `agg` better as it gives much more faster results. But shame `first` and `last` givesnon-NaN values, it was the fastest implementation so far. Thanks again @piRSquared – user2552108 May 24 '18 at 04:06

Python pandas get first and last index, duplicate if first is also the last, of group in data frame

1 Answers1

pd.concat

Using agg

`pd.concat`

Using `agg`