This question has two parts:
1) Is there a better way to do this?
2) If NO to #1, how can I fix my date issue?
I have a dataframe as follows
GROUP DATE VALUE DELTA
A 12/20/2015 2.5 ??
A 11/30/2015 25
A 1/31/2016 8.3
B etc etc
B etc etc
C etc etc
C etc etc
- This is a representation, there are close to 100 rows for each group (each row representing a unique date).
For each letter in GROUP, I want to find the change in value between successive dates. So for example for GROUP A I want the change between 11/30/2015 and 12/20/2015, which is -22.5. Currently I am doing the following:
df['DATE'] = pd.to_datetime(df['DATE'],infer_datetime_format=True)
df.sort_values('DATE',ascending=True)
df_out = []
for GROUP in df.GROUP.unique():
x = df[df.GROUP == GROUP]
x['VALUESHIFT'] = x['VALUE'].shift(+1)
x['DELTA'] = x['VALUE'].sub(x['VALUESHIFT'])
df_out.append(x)
df_out = pd.concat(df_out)
The challenge I am running into is the dates are not sorted correctly. So when the shift takes place and I calculate the delta it is not really the delta between successive dates.
Is this the right approach to handle? If so how can I fix my date issue? I have reviewed/tried the following to no avail:
Applying datetime format in pandas for sorting
how to make a pandas dataframe column into a datetime object showing just the date to correctly sort
doing calculations in pandas dataframe based on trailing row
Pandas - Split dataframe into multiple dataframes based on dates?