0

Suppose I have some example data:

import pandas as pd

data = [
    {"name": "cat", "var": "v1", "col1": 5, "col2": 7},
    {"name": "cat", "var": "v2", "col1": 4, "col2": 8},
    {"name": "dog", "var": "v1", "col1": 11, "col2": 15},
    {"name": "dog", "var": "v2", "col1": 12, "col2": 14},
]
df = pd.DataFrame(data)
print("DATA\n", df)
DATA
   name var  col1  col2
0  cat  v1     5     7
1  cat  v2     4     8
2  dog  v1    11    15
3  dog  v2    12    14

I would like to get all values of col2 where col1 is minimum with each name, like the following:

name  col2
cat   8
dog   15

My idea was to first pivot the data to deduplicate the names:

df = df.pivot("name", "var", ["col1", "col2"])
print("\nPIVOT\n", df)
PIVOT
      col1     col2    
var    v1  v2   v1  v2
name                  
cat     5   4    7   8
dog    11  12   15  14

Then I could get the minimum indices, to know which var contains the minimum for col1:

(Side note: is there a way to select just all vars without listing each possiblity?)

dfMinIdx = df[[("col1", "v1"), ("col1", "v2")]].idxmin(axis=1)
print("\nMINIDX\n", dfMinIdx)
MINIDX
 name
cat    (col1, v2)
dog    (col1, v1)
dtype: object

Now I know that for cat v2 should be selected and for dog it should pick v1.

But this is where I'm suck. How would I access the col2 values from dfMinIdx?

This is how I built an adjusted indexing DataFrame, but not sure how to "apply" it to df:

dfOtherIndex = dfMinIdx.transform(lambda x: ("col2", x[1]))

An alternative question would be how do I access a MultiIndex (here "PIVOT") with a Series of tuples (here "MINIDX")? I just wanted to give the full context in case I'm on the wrong path.

typ1232
  • 5,535
  • 6
  • 35
  • 51

0 Answers0