1

I would like to add a column (with header 'acc_dates') which would give the increasing count of the dates in a dataframe with datetime index. Here is an example:

import pandas as pd
import datetime as dt

datarange=pd.date_range('01-05-2018 00:00:00', periods=50, freq="4H")
range_series_1=pd.Series(np.random.randint(-5,3,size=50).astype(float), index=datarange)
df=pd.DataFrame({'value1':range_series_1})

df.head(5)
Out[287]: 
                     value1
datetime                   
2018-01-05 00:00:00     1.0
2018-01-05 04:00:00    -2.0
2018-01-05 08:00:00    -2.0
2018-01-05 12:00:00    -3.0
2018-01-05 16:00:00     1.0

If I apply the cumcount(), the 'value1' column disappears. This is what I type and what I get:

df.groupby(df.index.date).cumcount().to_frame('acc_dates').head(15)
Out[288]: 
                     acc_dates
datetime                      
2018-01-05 00:00:00          0
2018-01-05 04:00:00          1
2018-01-05 08:00:00          2
2018-01-05 12:00:00          3
2018-01-05 16:00:00          4
2018-01-05 20:00:00          5
2018-01-06 00:00:00          0
2018-01-06 04:00:00          1
2018-01-06 08:00:00          2
2018-01-06 12:00:00          3
2018-01-06 16:00:00          4
2018-01-06 20:00:00          5
2018-01-07 00:00:00          0
2018-01-07 04:00:00          1
2018-01-07 08:00:00          2

I can merge the two dataframes on 'datetime' to get the desired output, but I would prefer to not apply the pd.merge() method. Here's the output I expect:

Out[296]:
                     value1  acc_dates
datetime                              
2018-01-05 00:00:00     1.0          0
2018-01-05 04:00:00    -2.0          1
2018-01-05 08:00:00    -2.0          2
2018-01-05 12:00:00    -3.0          3
2018-01-05 16:00:00     1.0          4
2018-01-05 20:00:00     0.0          5
2018-01-06 00:00:00     2.0          0
2018-01-06 04:00:00    -3.0          1
2018-01-06 08:00:00    -5.0          2
2018-01-06 12:00:00    -5.0          3
2018-01-06 16:00:00     1.0          4
2018-01-06 20:00:00    -2.0          5
2018-01-07 00:00:00     2.0          0
2018-01-07 04:00:00     1.0          1
2018-01-07 08:00:00    -1.0          2
2018-01-07 12:00:00    -2.0          3

Ideally, I am looking for a method to somehow create and add the column in the initial df inplace.

Is this feasible? I welcome your suggestions.

Newbielp
  • 431
  • 3
  • 16

1 Answers1

1

I think merge or concat are not necessary, only assign output to new column:

df['acc_dates'] = df.groupby(df.index.date).cumcount()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    YES, thank you so much @jezrael! I had tried something like that initially, but I would get errors - who knows what I was actually typing? – Newbielp Oct 25 '19 at 07:41
  • If I could have your input to this too, it would be awesome ^_^ (https://stackoverflow.com/questions/58543914/perform-calculation-between-rows-with-datetime-index-and-place-them-into-new-col) – Newbielp Oct 25 '19 at 08:03