Assign multi-index column while preserving the order of index level values

Question

I have the following data frame with multi-index columns:

df = pd.DataFrame(np.arange(6).reshape(2, 3),
    columns=pd.MultiIndex.from_tuples([('foo', 'a'), ('bar', 'a'), ('bar', 'b')]))

  foo bar   
    a   a  b
0   0   1  2
1   3   4  5

I would like to assign a new column ('foo', 'b') such that the order of values in index level 0 is preserved, i.e. the resulting columns should be ('foo', 'a'), ('foo', 'b'), ('bar', 'a'), ('bar', 'b'):

expected = pd.DataFrame(
    [[0, 10, 1, 2], [3, 11, 4, 5]],
    columns=pd.MultiIndex.from_product([['foo', 'bar'], list('ab')]))

      foo     bar   
    a   b   a  b
0   0  10   1  2
1   3  11   4  5

The following would be nice and somehow intuitive, but unfortunately assign doesn't accept positional arguments:

df.assign({('foo', 'b'): [10, 11]})

So I tried various options, but the new column is always appended at the end:

# using column indexer (appends the new column to the end):
df2 = df.copy()
df2['foo', 'b'] = [10, 11]
print(df2)  # columns out of order
print(df2.sort_index(axis=1))  # order of "foo" and "bar" swapped

# using join (appends the new column to the end):
df3 = df.join(pd.DataFrame([10, 11], index=df.index,
    columns=pd.MultiIndex.from_tuples([('foo', 'b')])))
print(df3)  # columns out of order

# saving index levels beforehand doesn't help because they are sorted:
df4 = df.copy()
columns = df.columns.levels[0]  # columns out of order
df4['foo', 'b'] = [10, 11]
df4 = df4[columns]
print(df4)  # columns out of order

I could use [x[0] for x in df.columns], then remove duplicates (without set because order should be preserved) and then use the result to index into the columns of the new data frame but this approach feels way too heavy for such a simple task.

I am aware of this question however the answers there don't preserve the column ordering.

score 1 · Answer 1 · answered Sep 02 '19 at 15:23

1

That is insert

df.insert(1, ('foo', 'b'), [10, 11])
df
  foo     bar   
    a   b   a  b
0   0  10   1  2
1   3  11   4  5

answered Sep 02 '19 at 15:23

BENY

317,841
20
164
234

nice 1, `.loc` disrupts the order somehow – anky Sep 02 '19 at 15:37
`insert` seems to require the absolute position along the columns, you just hard coded it, but for a more complex setup this might be unfeasible. So I need a separate function to compute that index position? Something like `next(it.dropwhile(lambda x: x[1][0] == 'foo', it.dropwhile(lambda x: x[1][0] != 'foo', enumerate(df.columns))))[0]`. Seems pretty heavy too. – a_guest Sep 02 '19 at 15:39

score 0 · Accepted Answer · answered Sep 04 '19 at 08:19

Eventually I went with the following:

create first level column indexer
remove duplicates (using dict preserves order)
use it to index into the new data frame, restoring the old order

Code sample:

df['foo', 'b'] = [10, 11]
df = df[list(dict.fromkeys([x[0] for x in df.columns]))]

Assign multi-index column while preserving the order of index level values

2 Answers2