Input
Client | First name | Last Name | Start Date | End Date | Amount | Invoice Date |
---|---|---|---|---|---|---|
XXX | John | Kennedy | 15-01-2021 | 28-02-2021 | 137,586.00 | 20-04-2021 |
YYY | Peter | Paul | 7-02-2021 | 31-03-2021 | 38,750.00 | 20-04-2021 |
ZZZ | Michael | K | 10-03-2021 | 29-04-2021 | 137,586.00 | 30-04-2021 |
Code
df = pd.read_excel ('file.xlsx',parse_dates=['Start Date','End Date'] )
df['Start Date'] = pd.to_datetime(df['Start Date'],format='%d-%m-%Y')
df['End Date'] = pd.to_datetime(df['End Date'],format='%d-%m-%Y')
df['r'] = df.apply(lambda x: pd.date_range(x['Start Date'],x['End Date']), axis=1)
df = df.explode('r')
print(df)
months = df['r'].dt.month
starts, ends = months.ne(months.groupby(level=0).shift(1)), months.ne(months.groupby(level=0).shift(-1))
df2 = pd.DataFrame({'First Name': df['First name'],
'Start Date': df.loc[starts, 'r'].dt.strftime('%Y-%m-%d'),
'End Date': df.loc[ends, 'r'].dt.strftime('%Y-%m-%d'),
'Date Diff': df.loc[ends, 'r'].dt.strftime('%d').astype(int)-df.loc[starts, 'r'].dt.strftime('%d').astype(int)+1})
df = df.loc[~df.index.duplicated(), :]
df2 = pd.merge(df, df2, left_index=True, right_index=True)
df2['Amount'] = df['Amount'].mul(df2['Date_Diff'])
print(df['Amount'])
print (df)
df.to_excel('report.xlsx', index=True)
Error ValueError: cannot reindex from a duplicate axis
Expected output
how to resolve this issue?