Removing or merging multiindex columns in pandas dataframe without losing data

Question

I have a dataframe created by the pdblp module for python. The df contains price data with dates as index, and what looks like a multiindex / two layers of columns (ticker and 'PX_LAST'). In order to work with the data in an easier way, I'd like to get rid of the sub-header 'PX_LAST' (I don't need it, and it messes with some lookup functions I am doing separately).

This is how the dataframe looks:

df_px_orig.columns

Out[60]: 
MultiIndex(levels=[['IKH16 Comdty', 'IKM16 Comdty', 'IKU16 Comdty', 'RXH16 Comdty', 'RXM16 Comdty', 'RXU16 Comdty'], ['PX_LAST']],
           labels=[[3, 4, 5, 0, 1, 2], [0, 0, 0, 0, 0, 0]],
           names=['ticker', 'field'])

df_px_orig.head()

Out[41]: 
ticker     RXH16 Comdty RXM16 Comdty RXU16 Comdty IKH16 Comdty IKM16 Comdty  \
field           PX_LAST      PX_LAST      PX_LAST      PX_LAST      PX_LAST   
date                                                                          
2016-01-04       158.79       156.26       155.15       138.28       136.76   
2016-01-05       159.05       156.52       155.42       138.73       137.21   
2016-01-06       159.69       157.15       156.04       139.01       137.49   
2016-01-07       159.18       156.62       155.53       138.18       136.66   
2016-01-08       159.66       157.11       155.98       138.53       137.01   

ticker     IKU16 Comdty  
field           PX_LAST  
date                     
2016-01-04       136.76  
2016-01-05       137.21  
2016-01-06       137.49  
2016-01-07       136.66  
2016-01-08       137.01

The issue I have is that the methods suggested:

Here using columns.map(): {Pandas: combining header rows of a multiIndex DataFrame}

Here using columns.droplevel(): {Delete second row of header in PANDAS}

The result of both of these is the same - seemingly to get rid of all the data, and the new object I'm left with is simply an index:

dftest = df_px_orig.columns.droplevel(1)

dftest
Out[55]: 
Index(['RXH16 Comdty', 'RXM16 Comdty', 'RXU16 Comdty', 'IKH16 Comdty',
       'IKM16 Comdty', 'IKU16 Comdty'],
      dtype='object', name='ticker')

dftest.head()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-56-f54d042ff4d9> in <module>()
----> 1 dftest.head()

AttributeError: 'Index' object has no attribute 'head'

Perhaps I'm misunderstanding the structure of this dataframe altogether and should use some other method, help would be much appreciated. To be clear, my goal is to simply get rid of the 'PX_LAST' level and to keep everything else the same.

Many thanks

EDIT: Adding some code from the source of the plugin that is used to build the dataframe: {https://github.com/matthewgilbert/pdblp/blob/master/pdblp/pdblp.py}

 def bdh(self, tickers, flds, start_date, end_date, elms=None,
            ovrds=None, longdata=False):
        """
        Get tickers and fields, return pandas DataFrame with columns as
        MultiIndex with levels "ticker" and "field" and indexed by "date".
        If long data is requested return DataFrame with columns
        ["date", "ticker", "field", "value"].
        Parameters
        ----------
        tickers: {list, string}
            String or list of strings corresponding to tickers
        flds: {list, string}
            String or list of strings corresponding to FLDS
        start_date: string
            String in format YYYYmmdd
        end_date: string
            String in format YYYYmmdd
        elms: list of tuples
            List of tuples where each tuple corresponds to the other elements
            to be set, e.g. [("periodicityAdjustment", "ACTUAL")].
            Refer to the HistoricalDataRequest section in the
            'Services & schemas reference guide' for more info on these values
        ovrds: list of tuples
            List of tuples where each tuple corresponds to the override
            field and value
        longdata: boolean
            Whether data should be returned in long data format or pivoted

Can you give us some code which builds part of your dataframe from scratch? See [this answer](https://stackoverflow.com/a/20159305/9209546) for tips. — jpp, Jul 15 '18 at 17:50
Hi! The dataframe is created using `'df_px_orig = con.bdh(contract_list, ['PX_LAST'], '20160101', '20160113')`, where `contract_list = ['RXH16 Comdty', 'RXM16 Comdty' ... etc for all contract tickers]`. The pluigin that does the build is `pdblp`, and doesn't have that much documentation. {https://github.com/matthewgilbert/pdblp}. I will see if I can find something in the source, but i'm a bit of a n00b, so not really sure where to look. — chmith, Jul 15 '18 at 18:06

score 0 · Answer 1 · answered May 22 '20 at 18:55

0

Please try using :

df.reset_index(level = <level of the index you wish to drop> , drop = True, inplace = True)

It does what you are asking for that is to reset a level of index and keep the data as is.

answered May 22 '20 at 18:55

Ashwin Aashu

36
5

Removing or merging multiindex columns in pandas dataframe without losing data

1 Answers1