I have the following pandas multiindex:
import pandas as pd
import numpy as np
from collections import defaultdict
np.random.seed(0)
pd_df = defaultdict(list)
categories = ['cat 1', 'cat 2']
sig_bkgd = ['signal', 'bkgd']
masses = [i for i in range(10, 20)]
for m in masses:
for cat in categories:
for sb in sig_bkgd:
pd_df[(cat, sb)].append(np.random.randint(100))
pd.DataFrame(pd_df)
out:
cat 1 cat 2
signal bkgd signal bkgd
0 44 47 64 67
1 67 9 83 21
2 36 87 70 88
3 88 12 58 65
4 39 87 46 88
5 81 37 25 77
6 72 9 20 80
7 69 79 47 64
8 82 99 88 49
9 29 19 19 14
I'd like to set the mass array as the index column.
What I tried:
pd_df['Mass'] = masses
for m in masses:
for cat in categories:
for sb in sig_bkgd:
pd_df[(cat, sb)].append(np.random.randint(100))
pd.DataFrame(pd_df).set_index("Mass")
... the resulting dataframe looses the multiindex columns:
out:
(cat 1, signal) (cat 1, bkgd) (cat 2, signal) (cat 2, bkgd)
Mass
10 44 47 64 67
11 67 9 83 21
12 36 87 70 88
13 88 12 58 65
14 39 87 46 88
15 81 37 25 77
16 72 9 20 80
17 69 79 47 64
18 82 99 88 49
19 29 19 19 14
I could add two mass-cols but they are the same so it would be redundant and I'd have to set them both as index:
for m in masses:
for cat in categories:
pd_df[(cat, 'mass')].append(m)
for sb in sig_bkgd:
pd_df[(cat, sb)].append(np.random.randint(100))
pd.DataFrame(pd_df).set_index([('cat 1', 'mass'), ('cat 2', 'mass')])
cat 1 cat 2
signal bkgd signal bkgd
(cat 1, mass) (cat 2, mass)
10 10 44 47 64 67
11 11 67 9 83 21
12 12 36 87 70 88
13 13 88 12 58 65
14 14 39 87 46 88
15 15 81 37 25 77
16 16 72 9 20 80
17 17 69 79 47 64
18 18 82 99 88 49
19 19 29 19 19 14
What I want is a dataframe that looks like in the first image but just the mass array as the index column. Any help would be appreciated!