I have a dataframe created by the pdblp
module for python. The df contains price data with dates as index, and what looks like a multiindex / two layers of columns (ticker and 'PX_LAST'). In order to work with the data in an easier way, I'd like to get rid of the sub-header 'PX_LAST' (I don't need it, and it messes with some lookup functions I am doing separately).
This is how the dataframe looks:
df_px_orig.columns
Out[60]:
MultiIndex(levels=[['IKH16 Comdty', 'IKM16 Comdty', 'IKU16 Comdty', 'RXH16 Comdty', 'RXM16 Comdty', 'RXU16 Comdty'], ['PX_LAST']],
labels=[[3, 4, 5, 0, 1, 2], [0, 0, 0, 0, 0, 0]],
names=['ticker', 'field'])
df_px_orig.head()
Out[41]:
ticker RXH16 Comdty RXM16 Comdty RXU16 Comdty IKH16 Comdty IKM16 Comdty \
field PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST
date
2016-01-04 158.79 156.26 155.15 138.28 136.76
2016-01-05 159.05 156.52 155.42 138.73 137.21
2016-01-06 159.69 157.15 156.04 139.01 137.49
2016-01-07 159.18 156.62 155.53 138.18 136.66
2016-01-08 159.66 157.11 155.98 138.53 137.01
ticker IKU16 Comdty
field PX_LAST
date
2016-01-04 136.76
2016-01-05 137.21
2016-01-06 137.49
2016-01-07 136.66
2016-01-08 137.01
The issue I have is that the methods suggested:
Here using columns.map(): {Pandas: combining header rows of a multiIndex DataFrame}
Here using columns.droplevel(): {Delete second row of header in PANDAS}
The result of both of these is the same - seemingly to get rid of all the data, and the new object I'm left with is simply an index:
dftest = df_px_orig.columns.droplevel(1)
dftest
Out[55]:
Index(['RXH16 Comdty', 'RXM16 Comdty', 'RXU16 Comdty', 'IKH16 Comdty',
'IKM16 Comdty', 'IKU16 Comdty'],
dtype='object', name='ticker')
dftest.head()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-56-f54d042ff4d9> in <module>()
----> 1 dftest.head()
AttributeError: 'Index' object has no attribute 'head'
Perhaps I'm misunderstanding the structure of this dataframe altogether and should use some other method, help would be much appreciated. To be clear, my goal is to simply get rid of the 'PX_LAST' level and to keep everything else the same.
Many thanks
EDIT: Adding some code from the source of the plugin that is used to build the dataframe: {https://github.com/matthewgilbert/pdblp/blob/master/pdblp/pdblp.py}
def bdh(self, tickers, flds, start_date, end_date, elms=None,
ovrds=None, longdata=False):
"""
Get tickers and fields, return pandas DataFrame with columns as
MultiIndex with levels "ticker" and "field" and indexed by "date".
If long data is requested return DataFrame with columns
["date", "ticker", "field", "value"].
Parameters
----------
tickers: {list, string}
String or list of strings corresponding to tickers
flds: {list, string}
String or list of strings corresponding to FLDS
start_date: string
String in format YYYYmmdd
end_date: string
String in format YYYYmmdd
elms: list of tuples
List of tuples where each tuple corresponds to the other elements
to be set, e.g. [("periodicityAdjustment", "ACTUAL")].
Refer to the HistoricalDataRequest section in the
'Services & schemas reference guide' for more info on these values
ovrds: list of tuples
List of tuples where each tuple corresponds to the override
field and value
longdata: boolean
Whether data should be returned in long data format or pivoted