I am trying to fill missing dates by ID, however one of my indexed column has a duplicate date, so I tried this code but i keep getting this error "cannot reindex from a duplicate axis"
These are two codes I have tried:
FIRST:
udates=data['day'].unique()
filled_df = (data.set_index('day')
.groupby('ID')
.apply(lambda d: d.reindex(pd.date_range(min(data.day),
max(data.day),
freq='D')))
.drop('ID', axis=1)
.reset_index('ID')
.fillna(0))
filled_df
SECOND ONE:
users = pd.unique(data.ID)
data.day= pd.to_datetime(data.day)
dates = pd.date_range(min(data.day), max(data.day))
data.set_index('day', inplace=True)
df = pd.DataFrame(index=dates)
for u in users:
df[u] = data[data.ID==u].val
df = df.unstack().reset_index()
df.val.fillna(0, inplace=True)
df.val = df.val.astype(int)
df
this is an extract of the input data:
day | ID | val |
---|---|---|
01/26/2020 | AA | 100 |
01/28/2020 | AA | 200 |
01/26/2020 | BB | 100 |
01/27/2020 | BB | 100 |
01/29/2020 | BB | 40 |
This the output I want to get:
day | ID | val |
---|---|---|
01/26/2020 | AA | 100 |
01/27/2020 | AA | 0 |
01/28/2020 | AA | 200 |
01/26/2020 | BB | 100 |
01/27/2020 | BB | 100 |
01/28/2020 | BB | 0 |
01/29/2020 | BB | 40 |
Thank you
EDIT
df = pd.DataFrame({'ID': ['AA', 'AA', 'BB', 'BB','BB'],
'day': ['1/26/2020', '1/28/2020', '1/26/2020', '1/27/2020','1/29/2020'], 'val': [100, 200, 100, 100, 40]})