4

I have grouped timeseries with gaps. I wan't to fill the gaps, respecting the groupings.

date is unique within each id.

The following works but gives me zero's where I wan't NaN's

data.groupby('id').resample('D', on='date').sum()\
    .drop('id', axis=1).reset_index()

The following do not work for some reason

data.groupby('id').resample('D', on='date').asfreq()\
    .drop('id', axis=1).reset_index()

data.groupby('id').resample('D', on='date').fillna('pad')\
    .drop('id', axis=1).reset_index()

I get the following error: Upsampling from level= or on= selection is not supported, use .set_index(...) to explicitly set index to datetime-like

I've tried to use the pandas.Grouper with set_index multilevel index or single but it do not seems to upsample my date column so i get continous dates or it do not respect the id column.

Pandas is version 0.23

Try it yourself:

data = pd.DataFrame({
'id': [1,1,1,2,2,2],
'date': [
    datetime(2018, 1, 1),
    datetime(2018, 1, 5),
    datetime(2018, 1, 10),
    datetime(2018, 1, 1),
    datetime(2018, 1, 5),
    datetime(2018, 1, 10)],
'value': [100, 110, 90, 50, 40, 60]})

# Works but gives zeros
data.groupby('id').resample('D', on='date').sum()
# Fails
data.groupby('id').resample('D', on='date').asfreq()
data.groupby('id').resample('D', on='date').fillna('pad')
CodeMonkey
  • 3,418
  • 4
  • 30
  • 53

1 Answers1

6

Create DatetimeIndex and remove parameter on from resample:

print (data.set_index('date').groupby('id').resample('D').asfreq())
                id
id date           
1  2018-01-01  1.0
   2018-01-02  NaN
   2018-01-03  NaN
   2018-01-04  NaN
   2018-01-05  1.0
   2018-01-06  NaN
   2018-01-07  NaN
   2018-01-08  NaN
   2018-01-09  NaN
   2018-01-10  1.0
2  2018-01-01  2.0
   2018-01-02  NaN
   2018-01-03  NaN
   2018-01-04  NaN
   2018-01-05  2.0
   2018-01-06  NaN
   2018-01-07  NaN
   2018-01-08  NaN
   2018-01-09  NaN
   2018-01-10  2.0

print (data.set_index('date').groupby('id').resample('D').fillna('pad'))
#alternatives
#print (data.set_index('date').groupby('id').resample('D').ffill())
#print (data.set_index('date').groupby('id').resample('D').pad())
               id
id date          
1  2018-01-01   1
   2018-01-02   1
   2018-01-03   1
   2018-01-04   1
   2018-01-05   1
   2018-01-06   1
   2018-01-07   1
   2018-01-08   1
   2018-01-09   1
   2018-01-10   1
2  2018-01-01   2
   2018-01-02   2
   2018-01-03   2
   2018-01-04   2
   2018-01-05   2
   2018-01-06   2
   2018-01-07   2
   2018-01-08   2
   2018-01-09   2
   2018-01-10   2

EDIT:

If want use sum with missing values need min_count=1 parameter - sum:

min_count : int, default 0 The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

New in version 0.22.0: Added with the default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

print (data.groupby('id').resample('D', on='date').sum(min_count=1))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • its not working due to the dates not being unique. I get the following error `cannot reindex a non-unique index with a method or limit`. Each id represents a timeseries, they can overlap like, say you have multiple weather stations where you wan't to fill in the missing dates with NaN so as to get a continous daily time series. – CodeMonkey Jan 29 '19 at 13:28
  • @CodeMonkey - So need preprocessing data - first aggregate `sum` or `mean` for unique datetimes and then apply this solution. Give me some time for data sample. – jezrael Jan 29 '19 at 13:33
  • 1
    I've added sample data for you. – CodeMonkey Jan 29 '19 at 13:36
  • @CodeMonkey - I get no error, can you check now? Tested in pandas 0.24.0 – jezrael Jan 29 '19 at 13:40
  • @CodeMonkey - Just edited answer for `NaN`s with missing values and `sum`. – jezrael Jan 29 '19 at 13:58
  • 1
    Thank you! Also running the whole thing i found out i failed one of my own assert's. I had inadvertently deleted some code, so the dates was NOT unique within each ID. Fail from my part. – CodeMonkey Jan 29 '19 at 13:58