I'm having problems with MultiIndex and stack(). The following example is based on a solution from Calvin Cheung on StackOvervlow.
=== multi.csv ===
h1,main,h3,sub,h5
a,A,1,A1,1
b,B,2,B1,2
c,B,3,A1,3
d,A,4,B2,4
e,A,5,B3,5
f,B,6,A2,6
=== multi.py ===
#!/usr/bin/env python
import pandas as pd
df1 = pd.read_csv('multi.csv')
df2 = df1.pivot('main', 'sub').stack()
print(df2)
=== output ===
h1 h3 h5
main sub
A A1 a 1 1
B2 d 4 4
B3 e 5 5
B A1 c 3 3
A2 f 6 6
B1 b 2 2
This works as long as the entries in the sub column are unique with respect to the corresponding entry in the main column. But if we change the sub column entry in row e to B2, then B2 is no longer unique in the group of A rows and we get an error message: "pandas.core.reshape.ReshapeError: Index contains duplicate entries, cannot reshape".
I was expected the shape of the sub index to behave like the shape of the primary index where duplicates are indicated with blank entries under the first row entry.
=== expected output ===
h1 h3 h5
main sub
A A1 a 1 1
B2 d 4 4
e 5 5
B A1 c 3 3
A2 f 6 6
B1 b 2 2
So my question is, how can I structure a MultiIndex in a way that allows duplicates in sub-levels?