1

I have this df:

                Close
Symbol  Date    
AA  1/20/1998   29.530121
    1/14/2021   25.09
AIG 1/20/1998   485.831665
    1/14/2021   41.299999
AXP 1/20/1998   17.698372
... ... ...
WMT 1/14/2021   146.970001
XOM 1/20/1998   15.515985
    1/14/2021   50.310001

Symbol and Date are indexes. There are some exceptions where there is a symbol with only 1 date (WMT in the case of this example).

I want this:

    Start Date  End Date    Start Close End Close
Symbol              
AA  1/20/1998   1/14/2021   29.530121   25.09
AIG 1/20/1998   1/14/2021   485.831665  41.299999
AXP 1/20/1998   1/14/2021   17.698372   123.78
... ...         ...         ...         ...
VZ  1/20/1998   1/14/2021   14.96       57.040001
WMT 1/14/2021   Nan         146.970001  Nan
XOM 1/20/1998   1/14/2021   15.515985   50.310001

Other ways of dealing with the exception (WMT) are fine. I just envision taking the first date and assuming it's the start.

I found this stack overflow post which suggests pivot tables for a similar issue but I have been unable the adapt it. I have tried this:

test1 = df.reset_index()
pd.pivot_table(test1,index=['Symbol'],columns='Date',values='Close')

which returns:

Date    1998-01-20  2021-01-14
Symbol      
AA  29.530121   25.090000
AIG 485.831665  41.299999
AXP 17.698372   123.779999
BA  27.150482   209.910004
... ...         ...

This merges Symbol gets Close in the same row, but doesn't deal with the dates how I'm trying to. I've since been checking the pd.pivot_table documentation but haven't been able to get any closer. Any suggestions, pivot_table or otherwise?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
nonethewiser
  • 550
  • 1
  • 6
  • 18

1 Answers1

1

Taking a sample data and resetting the index

df = df.reset_index()

Symbol  Date    Close
0   AA  1/20/1998   29.530121
1   AA  1/14/2021   25.090000
2   AIG 1/20/1998   485.831665
3   AIG 1/14/2021   41.299999
4   WMT 1/14/2021   146.970001
5   XOM 1/20/1998   15.515985
6   XOM 1/14/2021   50.310001

You can groupby Symbol and take all the Date and Close in that group and create a single series of the two dates and the two closes, which would be taken up as a row corresponding the symbol

def func(df):
    dates = df['Date'].values.tolist()
    closes = df['Close'].values.tolist()
    if len(dates) == 1: # if there is one date,then End Date and End Close would be nans
        dates += [np.nan]
        closes += [np.nan]
    return pd.Series(dates + closes, index=['Start Date', 'End Date',
                                            'Start Close', 'End Close'])

df = df.groupby('Symbol')[['Date', 'Close']].apply(func)

df
    Start Date  End Date    Start Close End Close
Symbol              
AA  1/20/1998   1/14/2021   29.530121   25.090000
AIG 1/20/1998   1/14/2021   485.831665  41.299999
WMT 1/14/2021   NaN         146.970001  NaN
XOM 1/20/1998   1/14/2021   15.515985   50.310001

EDIT:

There is an easier way to do it. You can use unstack to create new columns. cumcount, which numbers each item in group, is used to know how many rows are there per group such that accordingly, columns can be updated.

out_df = df.set_index(['Symbol', df.groupby('Symbol').cumcount()]).unstack()
out_df.columns = [f'Start {c1}' if c2 == 0 else f'End {c1}' for c1, c2 in out_df.columns]
ggaurav
  • 1,764
  • 1
  • 10
  • 10