2

I am working on getting the index of the first and last occurrence of IDs in a data frame. But if the ID only appears once, then the last occurrence will be the same as the first one.

For example, a data like this:

ID  Date
A   1/1/2015
A   1/5/2016
A   1/3/2017
B   1/3/2017
C   1/5/2016
C   1/7/2016

and the output will be

ID  Index   Date
A   0   1/1/2015
A   2   1/3/2017
B   3   1/3/2017
B   3   1/3/2017
C   4   1/5/2016
C   5   1/5/2016

Note: I don't really need the index, it is just for making the question clearer.

I have tried using data.groupby('ID', as_index=False).nth([0,-1]) but in the example above, this will only output B once.

Thanks in advance

user2552108
  • 1,107
  • 3
  • 15
  • 30

1 Answers1

2

pd.concat

pd.concat([d.iloc[[0, -1]] for _, d in df.groupby('ID')])

  ID      Date
0  A  1/1/2015
2  A  1/3/2017
3  B  1/3/2017
3  B  1/3/2017
4  C  1/5/2016
5  C  1/7/2016

Using agg

df.groupby('ID').agg(['first', 'last']).stack().reset_index('ID')

      ID      Date
first  A  1/1/2015
last   A  1/3/2017
first  B  1/3/2017
last   B  1/3/2017
first  C  1/5/2016
last   C  1/7/2016
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks! I tried the code and it works. However, I have found one funny behavior with `agg`, if for example (using the same example above) the Date in the last occurence of C is NaN, `agg` will simply print the first Date of C (1/5/2016), is there a way to keep the NaN? – user2552108 May 24 '18 at 03:40
  • Ahh! That is the intended behavior of `'first'` and `'last'` in an `agg` context. This is very similar to a question of mine https://stackoverflow.com/q/45744800/2336654 – piRSquared May 24 '18 at 03:44
  • See updated answer. I like the new answer better for your objective. – piRSquared May 24 '18 at 03:46
  • I like `agg` better as it gives much more faster results. But shame `first` and `last` givesnon-NaN values, it was the fastest implementation so far. Thanks again @piRSquared – user2552108 May 24 '18 at 04:06