1

I have the following data frame with multi-index columns:

df = pd.DataFrame(np.arange(6).reshape(2, 3),
    columns=pd.MultiIndex.from_tuples([('foo', 'a'), ('bar', 'a'), ('bar', 'b')]))

  foo bar   
    a   a  b
0   0   1  2
1   3   4  5

I would like to assign a new column ('foo', 'b') such that the order of values in index level 0 is preserved, i.e. the resulting columns should be ('foo', 'a'), ('foo', 'b'), ('bar', 'a'), ('bar', 'b'):

expected = pd.DataFrame(
    [[0, 10, 1, 2], [3, 11, 4, 5]],
    columns=pd.MultiIndex.from_product([['foo', 'bar'], list('ab')]))

      foo     bar   
    a   b   a  b
0   0  10   1  2
1   3  11   4  5

The following would be nice and somehow intuitive, but unfortunately assign doesn't accept positional arguments:

df.assign({('foo', 'b'): [10, 11]})

So I tried various options, but the new column is always appended at the end:

# using column indexer (appends the new column to the end):
df2 = df.copy()
df2['foo', 'b'] = [10, 11]
print(df2)  # columns out of order
print(df2.sort_index(axis=1))  # order of "foo" and "bar" swapped

# using join (appends the new column to the end):
df3 = df.join(pd.DataFrame([10, 11], index=df.index,
    columns=pd.MultiIndex.from_tuples([('foo', 'b')])))
print(df3)  # columns out of order

# saving index levels beforehand doesn't help because they are sorted:
df4 = df.copy()
columns = df.columns.levels[0]  # columns out of order
df4['foo', 'b'] = [10, 11]
df4 = df4[columns]
print(df4)  # columns out of order

I could use [x[0] for x in df.columns], then remove duplicates (without set because order should be preserved) and then use the result to index into the columns of the new data frame but this approach feels way too heavy for such a simple task.

I am aware of this question however the answers there don't preserve the column ordering.

a_guest
  • 34,165
  • 12
  • 64
  • 118

2 Answers2

1

That is insert

df.insert(1, ('foo', 'b'), [10, 11])
df
  foo     bar   
    a   b   a  b
0   0  10   1  2
1   3  11   4  5
BENY
  • 317,841
  • 20
  • 164
  • 234
  • nice 1, `.loc` disrupts the order somehow – anky Sep 02 '19 at 15:37
  • `insert` seems to require the absolute position along the columns, you just hard coded it, but for a more complex setup this might be unfeasible. So I need a separate function to compute that index position? Something like `next(it.dropwhile(lambda x: x[1][0] == 'foo', it.dropwhile(lambda x: x[1][0] != 'foo', enumerate(df.columns))))[0]`. Seems pretty heavy too. – a_guest Sep 02 '19 at 15:39
0

Eventually I went with the following:

  1. create first level column indexer
  2. remove duplicates (using dict preserves order)
  3. use it to index into the new data frame, restoring the old order

Code sample:

df['foo', 'b'] = [10, 11]
df = df[list(dict.fromkeys([x[0] for x in df.columns]))]
a_guest
  • 34,165
  • 12
  • 64
  • 118