Pandas data-frame based on months between date columns and average of value

Question

I am working with a pandas dataframe, using df.groupby() was able to end in this that includes ['start_date'] and ['end_date'] and a value for an specific id.

| id         | start_date       | end_date       |value|
|:-----------|------------======|:---------------|-----|
| 1          |        02-01-2018|      05-31-2018|   40|
| 2          |        01-01-2018|      03-31-2018| 12.3|

There is any

And this is the dataframe im trying to end with: ( value is value/nummonths between start_date and end_date)

    |id          | month_belongs    | value|
    |------------|------------------|------|
    | 1          |        02-01-2018|    10|
    | 1          |        03-01-2018|    10|
    | 1          |        04-01-2018|    10|
    | 1          |        05-01-2018|    10|
    | 2          |        01-01-2018|   4.1|
    | 2          |        02-01-2018|   4.1|
    | 2          |        03-01-2018|   4.1|

I feel like this is going to be easier to solve starting with the original `DataFrame` before the groupby. — ALollz, Apr 11 '19 at 19:11
The first dataframe comes with start date and end date exactly the same I grouped by to sum all values for the same periods. — gizq, Apr 11 '19 at 19:14

score 2 · Accepted Answer · answered Apr 11 '19 at 19:19

More like a unnesting problem ,the hidden key was created by date_range

#df.start_date=pd.to_datetime(df.start_date,dayfirst=False)
#df.end_date=pd.to_datetime(df.end_date,dayfirst=False)
df['month_belongs']=[pd.date_range(x,y,freq='MS')for x , y in zip(df.start_date,df.end_date)]
df=unnesting(df,['month_belongs'])
df['value']/=df['value'].groupby(level=0).transform('size').values
df
Out[301]: 
  month_belongs  id start_date   end_date  value
0    2018-02-01   1 2018-02-01 2018-05-31   10.0
0    2018-03-01   1 2018-02-01 2018-05-31   10.0
0    2018-04-01   1 2018-02-01 2018-05-31   10.0
0    2018-05-01   1 2018-02-01 2018-05-31   10.0
1    2018-01-01   2 2018-01-01 2018-03-31    4.1
1    2018-02-01   2 2018-01-01 2018-03-31    4.1
1    2018-03-01   2 2018-01-01 2018-03-31    4.1

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

this unnesting solution is really elegant, Thank you. – gizq Apr 11 '19 at 19:51 — gizq, Apr 11 '19 at 19:51

score 1 · Answer 2 · answered Apr 11 '19 at 23:39

Premise: I am a newbie to panda and mostly to coding too. I post my solution more to receive indications on better ways to do it than anything else. For me it was already nice to be able to get to this point, and I felt the code was at least clean enough to show it (hope it's ok). I will probably have to spend some time wrapping my head around the accepted answer.

import pandas as pd
from datetime import datetime
from dateutil.relativedelta import relativedelta

start=[["02-01-2018", "05-31-2018", 40],
    ["01-01-2018", "03-31-2018", 12.3]]

df=pd.DataFrame(start,columns = ['std','end','v'])
df['std']=pd.to_datetime(df['std'])
df['end']=pd.to_datetime(df['end'])
df2=pd.DataFrame(columns = ['id', 'month_belongs', 'value'])
ix=0 # I'm sure there must be a better way here, than needing an index
for index, row in df.iterrows():
    e,s =row['end'], row['std']
    difference = relativedelta(e, s)
    months = difference.months+1
    while s <= e:
        df2.loc[ix]=[index,s,row['v']/months]
        s+= relativedelta(months=1)
        ix+=1
print(df2)

outputs:

  id month_belongs  value
0  0    2018-02-01   10.0
1  0    2018-03-01   10.0
2  0    2018-04-01   10.0
3  0    2018-05-01   10.0
4  1    2018-01-01    4.1
5  1    2018-02-01    4.1
6  1    2018-03-01    4.1

score 0 · Answer 3 · answered Apr 11 '19 at 19:32

0

import pandas as pd

df["value"] = df.apply(
    lambda x: x["value"]/(
        (pd.to_datetime(x["end_date"]) + pd.Timedelta(days=1)).month -
        pd.to_datetime(x["start_date"]).month), 
    axis=1
)

answered Apr 11 '19 at 19:32

Lante Dellarovere

1,838
2
7
10

Pandas data-frame based on months between date columns and average of value

3 Answers3