22

I have a dataframe with sporadic dates as the index, and columns = 'id' and 'num'. I would like to pd.groupby the 'id' column, and apply the reindex to each group in the dataframe.

My sample dataset looks like this:

            id  num
2015-08-01  1   3
2015-08-05  1   5
2015-08-06  1   4
2015-07-31  2   1
2015-08-03  2   2
2015-08-06  2   3

My expected output once pd.reindex with ffill is:

            id  num
2015-08-01  1   3
2015-08-02  1   3
2015-08-03  1   3
2015-08-04  1   3
2015-08-05  1   5
2015-08-06  1   4
2015-07-31  2   1
2015-08-01  2   1
2015-08-02  2   1
2015-08-03  2   2
2015-08-04  2   2
2015-08-05  2   2
2015-08-06  2   3

I have tried this, among other things to no avail: newdf=df.groupby('id').reindex(method='ffill') Which returns error:AttributeError: Cannot access callable attribute 'reindex' of 'DataFrameGroupBy' objects, try using the 'apply' method

Any help would be much appreciated

clg4
  • 2,863
  • 6
  • 27
  • 32

2 Answers2

36

There's probably a slicker way to do this but this works:

def reindex_by_date(df):
    dates = pd.date_range(df.index.min(), df.index.max())
    return df.reindex(dates).ffill()

df.groupby('id').apply(reindex_by_date).reset_index(0, drop=True)
JoeCondron
  • 8,546
  • 3
  • 27
  • 28
  • This works. Awesome job. It does take a little while but I can't imagine there is a quicker more pythonic way to do this. Thanks a ton. – clg4 Aug 28 '15 at 16:54
  • A slightly different flavor is to write the return part of the function as `return df.resample('D').fillna(method='ffill') `. This has the added advantage of changing the 'D' in the resample part to 'B' if you only want business days for example (of course this depends on what you want). – Pilik Aug 28 '15 at 20:40
  • Are you sure that works `Pilik` because I initially tried `resample` but it didn't add in the missing days? You can achieve different periods like business days in `pd.date_range` as well, e.g. `pd.date_range(.., offset='B')` for business days. – JoeCondron Aug 29 '15 at 16:04
  • @JoeCondron, you're right, I didn't know `pd.date_range` had an offset option. I copied the example using `pd.read_clipboard()` and my solution with `resample` also yields the desired result. – Pilik Aug 31 '15 at 21:13
  • 1
    This solution does not work for me: all the rows become `NaN`. If I change function to use `df.reindex(dates, method = 'ffill')` it gives me `TypeError: Cannot compare type 'Timestamp' with type 'str'` – Giacomo Oct 04 '17 at 14:30
  • 1
    @giac_man It sounds as though you have date-like strings in your index. `DatetimIndex` will look identical to an index containing strings of the form `'YYYY-MM-DD'`. You can convert using `pd.to_datetime` – JoeCondron Oct 04 '17 at 16:31
0
from cmath import pi
from datetime import datetime
from enum import unique
import io
from itertools import product

import numpy as np
import pandas as pd


df = pd.DataFrame(columns=['id','num'])
df['id'] = [1,1,1,2,2,2]
df['num'] = [3,5,4,1,2,3]
df['date'] = pd.date_range('1990-07-31', periods=6, freq='D')
print(df)
"""
   id  num       date
0   1    3 1990-07-31
1   1    5 1990-08-01
2   1    4 1990-08-02
3   2    1 1990-08-03
4   2    2 1990-08-04
5   2    3 1990-08-05

"""


df = df.set_index('date')

df = df.reindex(df.index.repeat(df['num']), method='ffill')

df['num_count'] = df.groupby(level=0).cumcount()

df = df.reset_index()

print (df)
"""
         date  id  num  num_count
0  1990-07-31   1    3          0
1  1990-07-31   1    3          1
2  1990-07-31   1    3          2
3  1990-08-01   1    5          0
4  1990-08-01   1    5          1
5  1990-08-01   1    5          2
6  1990-08-01   1    5          3
7  1990-08-01   1    5          4
8  1990-08-02   1    4          0
9  1990-08-02   1    4          1
10 1990-08-02   1    4          2
11 1990-08-02   1    4          3
12 1990-08-03   2    1          0
13 1990-08-04   2    2          0
14 1990-08-04   2    2          1
15 1990-08-05   2    3          0
16 1990-08-05   2    3          1
17 1990-08-05   2    3          2
"""
Soudipta Dutta
  • 1,353
  • 1
  • 12
  • 7