0

I have the following pandas multiindex:

import pandas as pd
import numpy as np
from collections import defaultdict

np.random.seed(0)

pd_df = defaultdict(list)
categories = ['cat 1', 'cat 2']
sig_bkgd = ['signal', 'bkgd']
masses = [i for i in range(10, 20)]

for m in masses:
    for cat in categories:
        for sb in sig_bkgd:
            pd_df[(cat, sb)].append(np.random.randint(100))

pd.DataFrame(pd_df)

out:

   cat 1       cat 2     
  signal bkgd signal bkgd
0     44   47     64   67
1     67    9     83   21
2     36   87     70   88
3     88   12     58   65
4     39   87     46   88
5     81   37     25   77
6     72    9     20   80
7     69   79     47   64
8     82   99     88   49
9     29   19     19   14

I'd like to set the mass array as the index column.

What I tried:

pd_df['Mass'] = masses
for m in masses:
    for cat in categories:
        for sb in sig_bkgd:
            pd_df[(cat, sb)].append(np.random.randint(100))

pd.DataFrame(pd_df).set_index("Mass")

... the resulting dataframe looses the multiindex columns:

out:

      (cat 1, signal)  (cat 1, bkgd)  (cat 2, signal)  (cat 2, bkgd)
Mass                                                                
10                 44             47               64             67
11                 67              9               83             21
12                 36             87               70             88
13                 88             12               58             65
14                 39             87               46             88
15                 81             37               25             77
16                 72              9               20             80
17                 69             79               47             64
18                 82             99               88             49
19                 29             19               19             14

I could add two mass-cols but they are the same so it would be redundant and I'd have to set them both as index:

for m in masses:
    for cat in categories:
        pd_df[(cat, 'mass')].append(m)
        for sb in sig_bkgd:
            pd_df[(cat, sb)].append(np.random.randint(100))

pd.DataFrame(pd_df).set_index([('cat 1', 'mass'), ('cat 2', 'mass')])
                             cat 1       cat 2     
                            signal bkgd signal bkgd
(cat 1, mass) (cat 2, mass)                        
10            10                44   47     64   67
11            11                67    9     83   21
12            12                36   87     70   88
13            13                88   12     58   65
14            14                39   87     46   88
15            15                81   37     25   77
16            16                72    9     20   80
17            17                69   79     47   64
18            18                82   99     88   49
19            19                29   19     19   14

What I want is a dataframe that looks like in the first image but just the mass array as the index column. Any help would be appreciated!

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Welcome to Stack Overflow! Please take the [tour]. [Please don't post pictures of text](https://meta.stackoverflow.com/q/285551/4518341). Instead, copy the text itself into your post, and use the formatting tools like [code formatting](/editing-help#code). You might need to use `print(df)`. As well, please provide a [mre]; random values are not reproducible, so use something like `np.random.seed(0)`. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea May 04 '23 at 18:01
  • 1
    Oh wait, you're over-complicating this. Just give the index to the df ctor: `pd.DataFrame(pd_df, index=masses).rename_axis('Mass')`. For the future, beware the [XY problem](https://meta.stackexchange.com/q/66377/343832); take a step back and consider what you're actually trying to accomplish. – wjandrea May 04 '23 at 18:09

0 Answers0