0

I have a problem with slicing data with pandas with multiindex with repetation.

Let's say I have a table (A and B are indicies)

A B C
1 1 11
1 2 12
1 3 13
2 1 21
2 2 22
2 3 23

and so on

And to vectors
a = [1, 2, 3, 1, 2, 1, 2 ]
b = [3, 2, 1, 3, 2, 1, 3 ]

I'd like to slice the table in a way to return vector c with values in line with indicies from vectors a and b.

c = [13, 22, 31, 13, 22, 11, 23] 

Only thing that comes to my mind is to pivot this table and get:

A B1 B2 B3
1 11 12 13
2 21 22 23
3 31 32 33

The apply one index to column A through loc to get proper rows, multiply with indicator matrix for chosing proper column for each row and cumsum to get a vector (with another slicing). I'm sure that there must be easier way to do it but I cannot find the proper way to do it

2 Answers2

3

You can do this by using your a and b arrays to create a new MultiIndex then reindex your dataframe:

Sample Data

import pandas as pd

index = pd.MultiIndex.from_product([[1,2,3], [1,2,3]])
df = pd.DataFrame({"C": [11, 12, 13, 21, 22, 23, 31, 32, 33]}, index=index)

print(df) # dataframe with 2-level index and 1 column "C"
      C
1 1  11
  2  12
  3  13
2 1  21
  2  22
  3  23
3 1  31
  2  32
  3  33

Method

  • Create new MultiIndex from your a and b arrays
  • Align the dataframe (or just column of interest) to this new index
a = [1, 2, 3, 1, 2, 1, 2 ]
b = [3, 2, 1, 3, 2, 1, 3 ]

new_index = pd.MultiIndex.from_arrays([a, b])
new_c = df["C"].reindex(new_index)

print(new_c.to_numpy())
[13 22 31 13 22 11 23]

Method 2

You can also zip your a and b arrays together and simply use .loc to slice your dataframe:

# Select the rows specified by combinations of a, b; in column "C"
new_c = df.loc[zip(a, b), "C"]

print(new_c.to_numpy())
[13 22 31 13 22 11 23]
Cameron Riddell
  • 10,942
  • 9
  • 19
0

An option is merge:

 pd.DataFrame({'A':a,'B':b}).merge(df, on=['A','B'], how='left')['C']

Output (note that NaN values corresponds to where the data is not available in df):

0    13.0
1    22.0
2     NaN
3    13.0
4    22.0
5    11.0
6    23.0
Name: C, dtype: float64
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74