I have the following data frame with multi-index columns:
df = pd.DataFrame(np.arange(6).reshape(2, 3),
columns=pd.MultiIndex.from_tuples([('foo', 'a'), ('bar', 'a'), ('bar', 'b')]))
foo bar
a a b
0 0 1 2
1 3 4 5
I would like to assign a new column ('foo', 'b')
such that the order of values in index level 0 is preserved, i.e. the resulting columns should be ('foo', 'a'), ('foo', 'b'), ('bar', 'a'), ('bar', 'b')
:
expected = pd.DataFrame(
[[0, 10, 1, 2], [3, 11, 4, 5]],
columns=pd.MultiIndex.from_product([['foo', 'bar'], list('ab')]))
foo bar
a b a b
0 0 10 1 2
1 3 11 4 5
The following would be nice and somehow intuitive, but unfortunately assign
doesn't accept positional arguments:
df.assign({('foo', 'b'): [10, 11]})
So I tried various options, but the new column is always appended at the end:
# using column indexer (appends the new column to the end):
df2 = df.copy()
df2['foo', 'b'] = [10, 11]
print(df2) # columns out of order
print(df2.sort_index(axis=1)) # order of "foo" and "bar" swapped
# using join (appends the new column to the end):
df3 = df.join(pd.DataFrame([10, 11], index=df.index,
columns=pd.MultiIndex.from_tuples([('foo', 'b')])))
print(df3) # columns out of order
# saving index levels beforehand doesn't help because they are sorted:
df4 = df.copy()
columns = df.columns.levels[0] # columns out of order
df4['foo', 'b'] = [10, 11]
df4 = df4[columns]
print(df4) # columns out of order
I could use [x[0] for x in df.columns]
, then remove duplicates (without set
because order should be preserved) and then use the result to index into the columns of the new data frame but this approach feels way too heavy for such a simple task.
I am aware of this question however the answers there don't preserve the column ordering.