Suppose I have some example data:
import pandas as pd
data = [
{"name": "cat", "var": "v1", "col1": 5, "col2": 7},
{"name": "cat", "var": "v2", "col1": 4, "col2": 8},
{"name": "dog", "var": "v1", "col1": 11, "col2": 15},
{"name": "dog", "var": "v2", "col1": 12, "col2": 14},
]
df = pd.DataFrame(data)
print("DATA\n", df)
DATA
name var col1 col2
0 cat v1 5 7
1 cat v2 4 8
2 dog v1 11 15
3 dog v2 12 14
I would like to get all values of col2
where col1
is minimum with each name
, like the following:
name col2
cat 8
dog 15
My idea was to first pivot the data to deduplicate the names:
df = df.pivot("name", "var", ["col1", "col2"])
print("\nPIVOT\n", df)
PIVOT
col1 col2
var v1 v2 v1 v2
name
cat 5 4 7 8
dog 11 12 15 14
Then I could get the minimum indices, to know which var
contains the minimum for col1
:
(Side note: is there a way to select just all var
s without listing each possiblity?)
dfMinIdx = df[[("col1", "v1"), ("col1", "v2")]].idxmin(axis=1)
print("\nMINIDX\n", dfMinIdx)
MINIDX
name
cat (col1, v2)
dog (col1, v1)
dtype: object
Now I know that for cat
v2
should be selected and for dog
it should pick v1
.
But this is where I'm suck. How would I access the col2
values from dfMinIdx
?
This is how I built an adjusted indexing DataFrame, but not sure how to "apply" it to df:
dfOtherIndex = dfMinIdx.transform(lambda x: ("col2", x[1]))
An alternative question would be how do I access a MultiIndex (here "PIVOT") with a Series of tuples (here "MINIDX")? I just wanted to give the full context in case I'm on the wrong path.