How to work with columns after using to_flat_index?

Question

I observed some strange behaviour of the column names when using to_flat_index() function.

Starting with a MultiIndex dataframe

a=[0,.25, .5, .75]
b=[1, 2, 3, 4]
c=[5, 6, 7, 8]
d=[1, 2, 3, 5]
df=pd.DataFrame(data={('a','a'):a, ('b', 'b'):b, ('c', 'c'):c, ('d', 'd'):d})

Produces this dataframe

      a  b  c  d
      a  b  c  d
0  0.00  1  5  1
1  0.25  2  6  2
2  0.50  3  7  3
3  0.75  4  8  5

Use the .to_flat_index to flatten the index

df.columns = df.columns.to_flat_index()

Produces the following dataframe

    (a, a)  (b, b)  (c, c)  (d, d)
0   0.00    1   5   1
1   0.25    2   6   2
2   0.50    3   7   3
3   0.75    4   8   5

If I try to select a column using df['(a, a)'] method I get a KeyError message. If I try to clean up the column name using df.columns = df.columns.str.lower().str.rstrip() (or any other .str method) I get nan instead of column names

NaN NaN NaN NaN
0   0.00    1   5   1
1   0.25    2   6   2
2   0.50    3   7   3
3   0.75    4   8   5

What am I doing wrong. How can I select the column after using to_flat_index()?

https://stackoverflow.com/questions/14507794/pandas-how-to-flatten-a-hierarchical-index-in-columns This thread might be helpful — Varsha, Mar 03 '20 at 22:20

score 3 · Accepted Answer · answered Mar 03 '20 at 22:17

3

If you want to select one column you have to do it with a tuple:

df[('a', 'a')]

If you only want one level column with the first element of the tuple as the name this should be enough:

df.columns = df.columns.to_flat_index().map(lambda x: x[0])

answered Mar 03 '20 at 22:17

jjsantoso

1,586
1
12
17

Thank you. Is there a way to get a real flat index of multiindex columns? I basically need column names like 'a_a', 'b_b', etc. – Eldar Vagapov Mar 03 '20 at 22:32
Yes, with `df.columns.to_flat_index().map(lambda x: '_'.join(x))` – jjsantoso Mar 03 '20 at 22:40

score 0 · Answer 2 · answered Mar 03 '20 at 22:24

With to_flat_index() you are converting the columns into a list of tuple, where elements of tuples are the level values of multinindex.

For the given multindex after flattening, the new columns are [('a', 'a'), ('b', 'b'), ('c', 'c'), ('d', 'd')]

So when you are applying string method's on the values of columns e.g. ('a', 'a') will return None which becomes NaN.

If you're trying to extract one specific level of values you can do so by,

df.columns = df.columns.get_level_values(0) # level index

Assuming a complicated operation on the multiple levels of data frame

join_str_with_char = lambda mi_cols, jstr: jstr.join([_.strip() for _ in mi_cols])

df.columns = df.columns.map(lambda x: join_str_with_char(x,'_'))

How to work with columns after using to_flat_index?

2 Answers2