groupby pandas dataframe, take difference between value of latest and earliest date

Question

I have a Cumulative column and I want to groupby index and take the values corresponding to the latest date minus the values corresponding to the earliest date.

Very similar to this: group by pandas dataframe and select latest in each group

But take the difference between latest and earliest in each group.

Please study other questions on stackoverflow. There is a better way of asking. Provide some examples yourself. (The person who just -1 ed this question is not being very helpful either.) — Robert Altena, Jul 14 '18 at 01:14

score 0 · Answer 1 · answered Jul 14 '18 at 02:04

I'm a python rookie, and here is my solution:

import pandas as pd
from io import StringIO

csv = StringIO("""index    id     product   date
0   220    6647     2014-09-01 
1   220    6647     2014-09-03 
2   220    6647     2014-10-16
3   826    3380     2014-11-11
4   826    3380     2014-12-09
5   826    3380     2015-05-19
6   901    4555     2014-09-01
7   901    4555     2014-10-05
8   901    4555     2014-11-01""")

df = pd.read_table(csv, sep='\s+',index_col='index')
df['date']=pd.to_datetime(df['date'],errors='coerce')

df_sort=df.sort_values('date')
df_sort.drop(['product'], axis=1,inplace=True)

df_sort.groupby('id').tail(1).set_index('id')-df_sort.groupby('id').head(1).set_index('id')

groupby pandas dataframe, take difference between value of latest and earliest date

1 Answers1