Here is possible specify first and second row in parameter header
for MultiIndex
in columns and index_col
for index from first column in function read_excel
:
df = pd.read_excel('file.xlsx', header=[0,1], index_col=[0], sheet_name=0)
Your solution should be changed with same parameters:
excel_file = pd.ExcelFile('file.xlsx')
df = excel_file.parse(header=[0,1], index_col=[0], sheet_name=0)
print (df)
CUSTOM NAME g1 g2
NAME A B A B C
NAME 1 1.0 NaN 1 NaN 1.0
NAME 1 NaN 1.0 1 1.0 NaN
print (df.columns)
MultiIndex(levels=[['g1', 'g2'], ['A', 'B', 'C']],
codes=[[0, 0, 1, 1, 1], [0, 1, 0, 1, 2]],
names=['CUSTOM NAME', 'NAME'])
print (df.index)
Index(['NAME 1', 'NAME 1'], dtype='object')
Filtering working with tuples for select columns of MultiIndex
:
print (df[df[('g1', 'A')] == 1])
CUSTOM NAME g1 g2
NAME A B A B C
NAME 1 1.0 NaN 1 NaN 1.0
More information in Select rows in pandas MultiIndex DataFrame
, only remove loc
, because MultiIndex
in columns.