-1

I have a dataframe (stockData) which has stock data for three different types of data (indexed by date) these are LAST,VOLUME,MKTCAP.

here is the head of the dataframe(stockData)

                           BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  \
date                                                                            
2001-01-02                  572               26605510               37494.60   
2001-01-03                  560               24715470               36708.00   
2001-01-04                  613               52781855               40182.15   
2001-01-05                  630               56600152               41296.50   
2001-01-08                  633               41014402               41493.15   

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  
date                                                                           
2001-01-02                  NaN                    NaN                    NaN  
2001-01-03               225.00              444328736            145216.0020  
2001-01-04               239.00              488568000            154251.6643  
2001-01-05               242.25              237936704            156349.2288  
2001-01-08               227.75              658059776            146990.8642 

Is there a way to take one of these fields for all of the stocks create a new set of columns from this data with a new post fix (_HOLIDAY) so I end up with:

              BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  BBG.XLON.BTA.S_HOLIDAY  \
date                                                                            
2001-01-02                  572               26605510               37494.60                   NaN  
2001-01-03                  560               24715470               36708.00                   NaN  
2001-01-04                  613               52781855               40182.15                   NaN  
2001-01-05                  630               56600152               41296.50                   NaN  
2001-01-08                  633               41014402               41493.15                   NaN  

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  BBG.XLON.VOD.S_HOLIDAY  
date                                                                           
2001-01-02                  NaN                    NaN                    NaN                   NaN  
2001-01-03               225.00              444328736            145216.0020                   NaN  
2001-01-04               239.00              488568000            154251.6643                   NaN  
2001-01-05               242.25              237936704            156349.2288                   NaN  
2001-01-08               227.75              658059776            146990.8642                   NaN 

Any assistance would be much appreciated.

Stacey
  • 4,825
  • 17
  • 58
  • 99

2 Answers2

1

is that what you want?

In [56]: newcols = df.columns.str.replace(r'\.S_.*','.S_HOLIDAY').unique().tolist()

In [57]: newcols
Out[57]: ['BBG.XLON.BTA.S_HOLIDAY', 'BBG.XLON.VOD.S_HOLIDAY']

then you can easily add new columns:

In [65]: for col in newcols:
   ....:         df[col] = np.nan
   ....:

In [66]: df
Out[66]:
            BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  \
2001-01-02                  572               26605510               37494.60
2001-01-03                  560               24715470               36708.00
2001-01-04                  613               52781855               40182.15
2001-01-05                  630               56600152               41296.50
2001-01-08                  633               41014402               41493.15

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  \
2001-01-02                  NaN                    NaN                    NaN
2001-01-03               225.00            444328736.0            145216.0020
2001-01-04               239.00            488568000.0            154251.6643
2001-01-05               242.25            237936704.0            156349.2288
2001-01-08               227.75            658059776.0            146990.8642

            BBG.XLON.BTA.S_HOLIDAY  BBG.XLON.VOD.S_HOLIDAY
2001-01-02                     NaN                     NaN
2001-01-03                     NaN                     NaN
2001-01-04                     NaN                     NaN
2001-01-05                     NaN                     NaN
2001-01-08                     NaN                     NaN

If the order of columns is important for you you can reorder it like this:

df = df[ordered_column_list]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

You can use DataFrame.columns.values to get the column names and then strip the substring after and including the last dot (.):

names=[s[:s.rfind('.')] for s in df.columns.values]

Here I assume that your dataframe is called df. This will result in duplicate names (for .S_LAST, .S_VOLUME and .S_MKTCAP). Now you can use numpy.unique to remove duplicates:

import numpy as np
uNames=np.unique(names)

And now you can add your new column <name>.S_HOLIDAY assigning a NaN value:

for n in uNames:
   df[n+'.S_HOLIDAY']=np.NaN
Community
  • 1
  • 1
agold
  • 6,140
  • 9
  • 38
  • 54