6

I have imported data into dataframe that looks like this

           VMGI US Equity   VMGI Open   VMGI High  VMGI Low  VMGI Px_last   VMGI Volume SPOM US Equity  SPOM Open   SPOM High   SPOM Low   SPOM Px_last SPOM Volume
Date
12/31/2012  12/31/2012      0.009       0.011      0.009         0.009      105726      12/31/2012      0.4575      0.4575      0.2925      0.3975       8890
1/1/2013    1/1/2013        0.009       0.011      0.009         0.009      105726      1/1/2013        0.4575      0.4575     0.2925      0.3975      8890
1/2/2013    1/2/2013        0.009       0.01       0.008         0.01       188150      1/2/2013        0.3975      0.3975     0.3225      0.3225      3400
1/3/2013    1/3/2013        0.011       0.018      0.011         0.015       169890     1/3/2013        0.34        0.3738     0.28        0.29       48933
1/4/2013    1/4/2013        0.015       0.018      0.014         0.018       33500      1/4/2013        0.36        0.4        0.3175      0.3175      3610

Each 6th column is a new stock. The rows go on for 1340 rows. I want to re-organize in a multi-index (I think) to create data that looks like this because I wanted to add additional columns for each stock. I was able to get the stock names with the following code

index2 =index1[0::6]     >>> which results in an object as follows (the first column for each stock) 
Index(['VMGI US Equity', 'SPOM US Equity', 'OPTL US Equity', 'FRHV US Equity', etc....

Ultimately I want the dataframe to look like a index with each stock

VMGI US Equity  VMGI US Equity  VMGI Open   VMGI High   VMGI Low    VMGI Px_last    VMGI Volume
                 12/31/2012       0.009      0.011      0.009       0.009            105726
                 1/1/2013         0.009      0.011      0.009       0.009            105726
                 1/2/2013         0.009      0.01       0.008       0.01             188150
                 1/3/2013         0.011      0.018      0.011       0.015          169890
                 1/4/2013         0.015      0.018      0.014       0.018          33500
SPOM US Equity  SPOM US Equity  SPOM Open   SPOM High   SPOM Low    SPOM Px_last    SPOM Volume
                12/31/2012       0.4575     0.4575      0.2925      0.3975          8890
                1/1/2013         0.4575     0.4575      0.2925      0.3975          8890

I have tried set_index and got the follow error.

df2.index = df_clean_penny1.set_index(index2)
ValueError: Length mismatch: Expected axis has 1340 elements, new values have 65 elements

From other posts I have also tried MultiIndex.from_arrays() but also could not get it to work. any help/guidance is appreciated

Roger Vadim
  • 373
  • 2
  • 12
J Westwood
  • 421
  • 1
  • 9
  • 22

1 Answers1

6

You can use the str accessor for pd.Index objects and create a pd.MultiIndex with split and the expand=True argument

df.columns = df.columns.str.split(' ', 1, expand=True)

Then you can stack the first level of the column index you just created

df.stack(0)

                   High     Low    Open  Px_last   US Equity  Volume
Date                                                                
12/31/2012 SPOM  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
           VMGI  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
1/1/2013   SPOM  0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
           VMGI  0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
1/2/2013   SPOM  0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
           VMGI  0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
1/3/2013   SPOM  0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
           VMGI  0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
1/4/2013   SPOM  0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
           VMGI  0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

A variation of this without editing the columns object in place would be to use the set_axis method. pd.DataFrame.set_axis as of Pandas version 0.21 now accepts an inplace=False argument which allows for pipelining.

df.set_axis(df.columns.str.split(' ', 1, expand=True), 1, 0).stack(0)

                   High     Low    Open  Px_last   US Equity  Volume
Date                                                                
12/31/2012 SPOM  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
           VMGI  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
1/1/2013   SPOM  0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
           VMGI  0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
1/2/2013   SPOM  0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
           VMGI  0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
1/3/2013   SPOM  0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
           VMGI  0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
1/4/2013   SPOM  0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
           VMGI  0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

To take is one step further, we can swap the levels of the index and sort to improve the layout.

df.set_axis(df.columns.str.split(' ', 1, expand=True), 1, 0).stack(0) \
    .swaplevel(0, 1).sort_index().reindex(df.index, level=1)

                   High     Low    Open  Px_last   US Equity  Volume
     Date                                                           
SPOM 12/31/2012  0.4575  0.2925  0.4575   0.3975  12/31/2012    8890
     1/1/2013    0.4575  0.2925  0.4575   0.3975    1/1/2013    8890
     1/2/2013    0.3975  0.3225  0.3975   0.3225    1/2/2013    3400
     1/3/2013    0.3738  0.2800  0.3400   0.2900    1/3/2013   48933
     1/4/2013    0.4000  0.3175  0.3600   0.3175    1/4/2013    3610
VMGI 12/31/2012  0.0110  0.0090  0.0090   0.0090  12/31/2012  105726
     1/1/2013    0.0110  0.0090  0.0090   0.0090    1/1/2013  105726
     1/2/2013    0.0100  0.0080  0.0090   0.0100    1/2/2013  188150
     1/3/2013    0.0180  0.0110  0.0110   0.0150    1/3/2013  169890
     1/4/2013    0.0180  0.0140  0.0150   0.0180    1/4/2013   33500

Strictly speaking, that last bit with the reindex isn't exactly necessary. But it bothered me that I might be rearranging dates. So I put them back in place.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    This is so clean and elegant. – cs95 Feb 20 '18 at 00:22
  • 1
    I will give it a try. ty. Yes Bloomberg (one of the non developers/engineers) as you can tell by the question – J Westwood Feb 20 '18 at 00:27
  • @piRSquared is there a way to ultimately end up with stock as first level in multi-index and then combine the full series of dates with each individual stock. So that I could somehow "call/index" a single individual stock out of DF? – J Westwood Feb 20 '18 at 00:39
  • @JWestwood I believe what I've been mucking about with does that. See my updates and let me know if that answers the question. – piRSquared Feb 20 '18 at 00:41
  • it will take me some time to figure out what you've done but it seems to work well. marked as correct. ty – J Westwood Feb 20 '18 at 00:50
  • @piRSquared. I am trying to add rolling averages to this now with this code >> df2['10D_MA'] = df2.groupby(level=0).rolling(10).mean() >>>and I get the following error NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented which seems to indicate I have to choose a column to measure the rolling but adding df2['Px_last'] in front of rolling alsodid not work. I would open another question but i see this post from you so I thought you may know the answer. appreciate if you can help. https://stackoverflow.com/questions/38334832/rolling-averages-on-groups – J Westwood Feb 21 '18 at 03:08
  • you need to create a function to use in `df2.groupby(level=0).apply(lambda df: df.rolling(10).mean())`. If that doesn't work, ask a new question. – piRSquared Feb 21 '18 at 03:10