2

I have a following dataframe:

    user_id   end date start date    no
0         1 2018-03-01 2018-01-01    15
1         1 2018-04-01 2018-02-01    20
2         1 2018-05-01 2018-03-01    35
3         2 2018-07-01 2018-04-01    50
4         2 2018-07-01 2018-05-01    18

I want to create another dataframe such that for a given user id, I have start and end dates along with the last date for all months between given start and end dates e-g:

    user_id    date       no
          1 2018-01-01    15
          1 2018-02-28    15
          1 2018-03-01    15
          1 2018-02-01    20
          1 2018-03-31    20
          1 2018-04-01    20
          1 2018-03-01    35
          1 2018-04-30    35
          1 2018-05-01    35
          2 2018-04-01    50
          2 2018-05-31    50
          2 2018-06-30    50
          2 2018-07-01    50
          2 2018-05-01    18
          2 2018-06-30    18
          2 2018-07-01    1
Muhammad Hassan
  • 4,079
  • 1
  • 13
  • 27

1 Answers1

4

Use:

#rename columns for using itertuples
df = df.rename(columns=lambda x: x.replace(' ', '_'))
#convert columns to datetimes if necessary
df[['start_date','end_date']] = df[['start_date','end_date']].apply(pd.to_datetime)
#repeat datetimes to Series
s = pd.concat([pd.Series(r.Index,pd.date_range(r.start_date, r.end_date, freq='MS')) 
                         for r in df.itertuples()])
#swap keys with values
s = pd.Series(s.index, index=s, name='date')
#add to original
df = df.join(s).reset_index(drop=True)
#check values if match
mask = df[['start_date','end_date']].ne(df['date'], axis=0).all(axis=1)
df = df.drop(['start_date','end_date'], axis=1)
#set end of month for added datetimes
df.loc[mask, 'date'] += pd.offsets.MonthEnd()
#change order of columns
df = df[['user_id','date','no']]

print (df)
    user_id       date  no
0         1 2018-01-01  15
1         1 2018-02-28  15
2         1 2018-03-01  15
3         1 2018-02-01  20
4         1 2018-03-31  20
5         1 2018-04-01  20
6         1 2018-03-01  35
7         1 2018-04-30  35
8         1 2018-05-01  35
9         2 2018-04-01  50
10        2 2018-05-31  50
11        2 2018-06-30  50
12        2 2018-07-01  50
13        2 2018-05-01  18
14        2 2018-06-30  18
15        2 2018-07-01  18

EDIT:

For match starting and end of value not starting by first day of month use:

print (df)
   user_id    end date  start date  no
0        1  2018-05-10  2018-03-10  15
1        1  2018-04-01  2018-02-01  20
2        1  2018-05-01  2018-03-01  35
3        2  2018-07-01  2018-04-01  50
4        2  2018-07-01  2018-05-01  18

#convert columns to datetimes if necessary
df[['start date','end date']] = df[['start date','end date']].apply(pd.to_datetime)
#create new column with start of months
df['start_date'] = df['start date'].dt.to_period('m').dt.to_timestamp()
df['end_date'] = df['end date'].dt.to_period('m').dt.to_timestamp()
#repeat datetimes to Series
s = pd.concat([pd.Series(r.Index,pd.date_range(r.start_date, r.end_date, freq='MS')) 
                         for r in df.itertuples()])
#swap keys with values
s = pd.Series(s.index, index=s, name='date')
#add to original
df = df.join(s).reset_index(drop=True)
#check values if match
m1 = df['start_date'].ne(df['date'])
m2 = df['end_date'].ne(df['date'])

#replace values by start and end of groups
df['date'] = df['date'].where(m1, df['start date'])
df['date'] = df['date'].where(m2, df['end date'])

df = df.drop(['start_date','end_date','start date','end date'], axis=1)
#set end of month for added datetimes
df.loc[m1 & m2, 'date'] += pd.offsets.MonthEnd()
#change order of columns
df = df[['user_id','date','no']]

print (df)
    user_id       date  no
0         1 2018-03-10  15
1         1 2018-04-30  15
2         1 2018-05-10  15
3         1 2018-02-01  20
4         1 2018-03-31  20
5         1 2018-04-01  20
6         1 2018-03-01  35
7         1 2018-04-30  35
8         1 2018-05-01  35
9         2 2018-04-01  50
10        2 2018-05-31  50
11        2 2018-06-30  50
12        2 2018-07-01  50
13        2 2018-05-01  18
14        2 2018-06-30  18
15        2 2018-07-01  18
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This deosn't work in case, the start date/ end date in first dataframe has day other than first day of month. e-g: for this start and end date combination (2018-03-10, 2018-05-10) it would yield ('2018-04-01', '2018-05-01') but I want (2018-03-10, 2018-04-01, 2018-05-10) – Muhammad Hassan May 16 '19 at 09:54