I observed some strange behaviour of the column names when using to_flat_index() function.
Starting with a MultiIndex dataframe
a=[0,.25, .5, .75]
b=[1, 2, 3, 4]
c=[5, 6, 7, 8]
d=[1, 2, 3, 5]
df=pd.DataFrame(data={('a','a'):a, ('b', 'b'):b, ('c', 'c'):c, ('d', 'd'):d})
Produces this dataframe
a b c d
a b c d
0 0.00 1 5 1
1 0.25 2 6 2
2 0.50 3 7 3
3 0.75 4 8 5
Use the .to_flat_index to flatten the index
df.columns = df.columns.to_flat_index()
Produces the following dataframe
(a, a) (b, b) (c, c) (d, d)
0 0.00 1 5 1
1 0.25 2 6 2
2 0.50 3 7 3
3 0.75 4 8 5
If I try to select a column using df['(a, a)'] method I get a KeyError message. If I try to clean up the column name using df.columns = df.columns.str.lower().str.rstrip() (or any other .str method) I get nan instead of column names
NaN NaN NaN NaN
0 0.00 1 5 1
1 0.25 2 6 2
2 0.50 3 7 3
3 0.75 4 8 5
What am I doing wrong. How can I select the column after using to_flat_index()?