2

I have a df like below:

President   Start Date  End Date
B Clinton   1992-01-01  1999-12-31
G Bush      2000-01-01  2007-12-31
B Obama     2008-01-01  2015-12-31
D Trump     2016-01-01  2019-12-31 # not too far away!!

I want to create another df, something like this

timestamp   President
1992-01-01  B Clinton
1992-01-02  B Clinton
...
2000-01-01  G Bush
...

Basically I want to create a dataframe which its index is time stamp and then its content is selected based on the condition on the two columns of another df.

I feel there is a way within pandas to do this, but I am not sure how. I tried to use np.piecewise but seems generating the conditions will be very hard for me. How could I do this?

Bob Fang
  • 6,963
  • 10
  • 39
  • 72

3 Answers3

4

This is another unnesting problem

df['New']=[pd.date_range(x,y).tolist() for x , y in zip (df.StartDate,df.EndDate)]

unnesting(df,['New'])

FYI I have pasted the function here

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234
2

You can use pd.date_range to create range of dates from start and end values. Make sure Start and End dates are in datetime format.

s = df.set_index('President').apply(lambda x: pd.Series(pd.date_range(x['Start Date'], x['End Date'])), axis = 1).stack().reset_index(1, drop = True)

new_df = pd.DataFrame(s.index.values, index=s, columns = ['President'] )



            President
1992-01-01  B Clinton
1992-01-02  B Clinton
1992-01-03  B Clinton
1992-01-04  B Clinton
1992-01-05  B Clinton
1992-01-06  B Clinton
1992-01-07  B Clinton
1992-01-08  B Clinton
1992-01-09  B Clinton
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • 2
    You may not want to using apply . :-) https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code – BENY Feb 22 '19 at 18:43
0

Perhaps you could use a PeriodIndex instead of a DatetimeIndex because you are dealing with regularly spaced intervals of time, i.e., years.

# create a list of PeriodIndex objects with annual frequency
p_idxs = [pd.period_range(start, end, freq='A') for idx, (start, end) in df[['Start Date', 'End Date']].iterrows()]

# for each PeriodIndex create a DataFrame where 
# the number of president instances matches the length of the PeriodIndex object
df_list = []
for pres, p_idx in zip(df['President'].tolist(), p_idxs):
    df_ = pd.DataFrame(data=len(p_idx)*[pres], index=p_idx)
    df_list.append(df_)

# concatenate everything to get the desired output
df_desired = pd.concat(df_list, axis=0)
jeschwar
  • 1,286
  • 7
  • 10