0

I am trying to fill missing dates by ID, however one of my indexed column has a duplicate date, so I tried this code but i keep getting this error "cannot reindex from a duplicate axis"

These are two codes I have tried:

FIRST:

udates=data['day'].unique()
filled_df = (data.set_index('day')
             .groupby('ID')
             .apply(lambda d: d.reindex(pd.date_range(min(data.day),
                                                      max(data.day),
                                                      freq='D')))
             .drop('ID', axis=1)
             .reset_index('ID')
             .fillna(0))
filled_df

SECOND ONE:

users = pd.unique(data.ID)
data.day= pd.to_datetime(data.day)
dates = pd.date_range(min(data.day), max(data.day))
data.set_index('day', inplace=True)

df = pd.DataFrame(index=dates)
for u in users:
    df[u] = data[data.ID==u].val

df = df.unstack().reset_index()
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df

this is an extract of the input data:

day ID val
01/26/2020 AA 100
01/28/2020 AA 200
01/26/2020 BB 100
01/27/2020 BB 100
01/29/2020 BB 40

This the output I want to get:

day ID val
01/26/2020 AA 100
01/27/2020 AA 0
01/28/2020 AA 200
01/26/2020 BB 100
01/27/2020 BB 100
01/28/2020 BB 0
01/29/2020 BB 40

Thank you

EDIT


df = pd.DataFrame({'ID': ['AA', 'AA', 'BB', 'BB','BB'], 
                   'day': ['1/26/2020', '1/28/2020', '1/26/2020', '1/27/2020','1/29/2020'], 'val': [100, 200, 100, 100, 40]})

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
Sissi
  • 77
  • 7
  • Can you include the sample data that would be used to create your desired output? See: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for tips – Henry Ecker Apr 28 '21 at 20:56
  • thanks, I edit the question – Sissi Apr 28 '21 at 21:05
  • 1
    ``df.set_index('day').groupby("ID").apply(lambda d: d.reindex(pd.date_range(d.index.min(),d.index.max(), freq='D')))`` works fine for your code. Kindly share an example data with duplicates *within* a group – sammywemmy Apr 28 '21 at 21:16

1 Answers1

1
>>> df.set_index("day") \
      .groupby("ID")["val"] \
      .resample("D") \
      .first() \
      .fillna(0) \
      .reset_index()

   ID        day    val
0  AA 2020-01-26  100.0
1  AA 2020-01-27    0.0
2  AA 2020-01-28  200.0
3  BB 2020-01-26  100.0
4  BB 2020-01-27  100.0
5  BB 2020-01-28    0.0
6  BB 2020-01-29   40.0

Note: the function first() is useless. It's because Resampler.fillna() only works with the method keyword. You cannot pass a value unlike DataFrame.fillna().

Corralien
  • 109,409
  • 8
  • 28
  • 52