11

I've a time series that i resampled into this dataframe df ,

My data is from 6th june to 28 june. it want to extend the data from 1st june to 30th june. count column will have 0 value in only extended period and my real values from 6th to 28th.

Out[123]: 
                         count
Timestamp                    
2009-06-07 02:00:00         1
2009-06-07 03:00:00         0
2009-06-07 04:00:00         0
2009-06-07 05:00:00         0
2009-06-07 06:00:00         0

i need to the make the

start date:2009-06-01 00:00:00

end date:2009-06-30 23:00:00

so the data would look something like this:

                         count
Timestamp                    
2009-06-01 01:00:00         0
2009-06-01 02:00:00         0
2009-06-01 03:00:00         0

is there an effective way to perform this. the only way i can think of is not that effective.i am trying this since yesterday. please help

  index = pd.date_range('2009-06-01 00:00:00','2009-06-30 23:00:00', freq='H')
    df = pandas.DataFrame(numpy.zeros(len(index),1), index=index)
    df.columns=['zeros']
    result= pd.concat([df2,df])
    result1= pd.concat([df,result])
    result1.fillna(0)
    del result1['zero']
sparktime12
  • 161
  • 1
  • 2
  • 9
  • I don't understand the expected output. Is it only three rows or did you just posted the head? Is it supposed to go until 30th of June? If so, `ser.reindex(idx, fill_value=0)` should be enough. – ayhan Aug 27 '17 at 19:45
  • @ayhan yes it is supposed to go until 30th of June. my data is from 6th june to 28 june. it want to extend the data from 1st june to 30th june. count column will have 0 value in only extended period and my real values from 6th to 28th. – sparktime12 Aug 27 '17 at 19:56

2 Answers2

12

You can create a new index with the desired start and end day/times, resample the time series data and aggregate by count, then set the index to the new index.

import pandas as pd

# create the index with the start and end times you want
t_index = pd.DatetimeIndex(pd.date_range(start='2009-06-01', end='2009-06-30 23:00:00', freq="1h"))

# create the data frame
df = pd.DataFrame([['2009-06-07 02:07:42'],
                   ['2009-06-11 17:25:28'],
                   ['2009-06-11 17:50:42'],
                   ['2009-06-11 17:59:18']], columns=['daytime'])
df['daytime'] = pd.to_datetime(df['daytime'])

# resample the data to 1 hour, aggregate by counts,
# then reset the index and fill the na's with 0
df2 = df.resample('1h', on='daytime').count().reindex(t_index).fillna(0)
James
  • 32,991
  • 4
  • 47
  • 70
  • does resample have 'on' argument . i m getting this error while this code TypeError: resample() got an unexpected keyword argument 'on' – sparktime12 Aug 27 '17 at 20:24
  • 1
    Ah, the API for resample was changed in pandas 0.19.0. You can accomplish the same thing in in earlier versions, but it may take a bit more futzing. – James Aug 27 '17 at 20:34
  • Yes, this no longer works. Raises `Argument 'tuples' has incorrect type (expected numpy.ndarray, got DatetimeArray)` – coler-j Mar 20 '19 at 23:28
  • 2
    As mentioned [here](https://stackoverflow.com/a/62483787/297299), DateTimeIndex does not accept `start` and `end` arguments any longer. `date_range` can be used to accomplish the same thing (i.e. `pd.DatetimeIndex(pd.date_range(start='2009-06-01', end='2009-06-30 23:00:00), freq="1h")`) – Toni Penya-Alba Nov 10 '20 at 07:47
  • @ToniPenya-Alba Looks like [date_range](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html#pandas.date_range) returns a DatetimeIndex already so the cast is not needed. – Jérôme Jun 24 '22 at 14:20
0

DatetimeIndex() no longer works with those arguments, raises __new__() got an unexpected keyword argument 'start'