0

This question has two parts:

1) Is there a better way to do this?

2) If NO to #1, how can I fix my date issue?

I have a dataframe as follows

  GROUP    DATE        VALUE    DELTA
    A   12/20/2015      2.5      ??
    A   11/30/2015      25        
    A    1/31/2016      8.3       
    B       etc         etc
    B       etc         etc  
    C       etc         etc
    C       etc         etc
  • This is a representation, there are close to 100 rows for each group (each row representing a unique date).

For each letter in GROUP, I want to find the change in value between successive dates. So for example for GROUP A I want the change between 11/30/2015 and 12/20/2015, which is -22.5. Currently I am doing the following:

df['DATE'] = pd.to_datetime(df['DATE'],infer_datetime_format=True)
df.sort_values('DATE',ascending=True)
df_out = []
for GROUP in df.GROUP.unique():
    x = df[df.GROUP == GROUP]
    x['VALUESHIFT'] = x['VALUE'].shift(+1)
    x['DELTA'] = x['VALUE'].sub(x['VALUESHIFT'])
    df_out.append(x)
df_out = pd.concat(df_out)

The challenge I am running into is the dates are not sorted correctly. So when the shift takes place and I calculate the delta it is not really the delta between successive dates.

Is this the right approach to handle? If so how can I fix my date issue? I have reviewed/tried the following to no avail:

Applying datetime format in pandas for sorting

how to make a pandas dataframe column into a datetime object showing just the date to correctly sort

doing calculations in pandas dataframe based on trailing row

Pandas - Split dataframe into multiple dataframes based on dates?

Community
  • 1
  • 1
Doug
  • 209
  • 2
  • 9
  • 19
  • One thing to add - I have not set an index on the dataframe. It just has the default numeric 0, 1, 2, 3, 4, index set when I create the dataframe. – Doug Feb 09 '17 at 15:09

1 Answers1

0

Answering my own question. This works:

df['DATE'] = pd.to_datetime(df['DATE'],infer_datetime_format=True)
df_out = []
for ID in df.GROUP.unique():
    x = df[df.GROUP == ID]
    x.sort_values('DATE',ascending=True, inplace=True)
    x['VALUESHIFT'] = x['VALUE'].shift(+1)
    x['DELTA'] = x['VALUE'].sub(x['VALUESHIFT'])
    df_out.append(x)
df_out = pd.concat(df_out)

1) Added inplace=True to sort value.

2) Added the sort within the for loop.

3) Changed by loop from using GROUP to ID since it is also the name of a column name, which I imagine is considered sloppy?

Doug
  • 209
  • 2
  • 9
  • 19