0

I seem to be coming across an issue in trying to add columns to my dataframe, specific to each month for a given start date and end date.

Input:

In [1]: df = pd.DataFrame([[01/04/2022, 03/05/2022], [07/04/2022,01/05/2022]], columns=['Contract From', '
Contract To'])

I apply the following to the specific columns I need to use:

df['Contract To'] = pd.to_datetime(df['Contract To'], dayfirst=True)
df['Start Date'] = df['Contract From']
df['End Date'] = df['Contract To']
df['End Date'].fillna("31/03/2023", inplace=True)
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['End Date'] = pd.to_datetime(df['End Date'])

Then I try the following:

res = df.join(df.apply(lambda x: pd.Series(pd.date_range(x['Start Date'], x['End Date'], freq='D').to_period('M')), axis=1).apply(pd.value_counts, axis=1).fillna(0).astype(int))

However on opening the new excel spreadsheet that is saved I'm not seeing any additional columns being added. I'd expect to see the following:

Start Date End Date 2022-04 2022-05 2022-06
01/04/2022 03/05/2022 30 3 0
07/04/2022 01/05/2022 23 1 0

Any help with this would be greatly appreciated, I think this is throwing me as I'm not getting any error pop up in relation to the above so not sure what's exactly going wrong here

Edit - Added example input

  • please share original df , and also the result which it is showing for now. – murari prasad Oct 19 '22 at 12:07
  • Hi, can't share the original df as there's some unanonymised data in there, but in terms of the results, I'm getting just the first two columns in the table above, not getting the last 3 columns which is what i'd expect – youguesswho Oct 19 '22 at 12:12
  • Could you provide a mock example of the input and output dataframe? Could you have a look to: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples For instance, as a reader I am having a hard time understanding the link between the Excel export and the `res` dataframe. – Adrien Pacifico Oct 19 '22 at 12:36
  • Well boy do I feel stupid! Just changed the res to df and it works! – youguesswho Oct 19 '22 at 12:46

1 Answers1

0

Seems to work

  start_date = ['01/04/2022', '07/04/2022']
    end_date = ['03/05/2022', '01/05/2022']
    df = pd.DataFrame(list(zip(start_date, end_date)), columns=['start_date', 'end_date'])
    
    df['start_date'] = pd.to_datetime(df['start_date'],dayfirst=True)
    df['end_date'] = pd.to_datetime(df['end_date'],dayfirst=True)
    res = df.join(df.apply(lambda x: pd.Series(pd.date_range(x['start_date'], x['end_date'], freq='D').to_period('M')), axis=1).apply(pd.value_counts, axis=1).fillna(0).astype(int))
    print(res)
      start_date   end_date  2022-04  2022-05
    0 2022-04-01 2022-05-03       30        3
    1 2022-04-07 2022-05-01       24        1
Mathieu P.
  • 344
  • 1
  • 10