0

I have a df with categories and thresholds:

cat t1 t2 t3 t4
a   2  4  6  8
b   3  5  7  0
c   0  0  1  0

My end goal is to return the column name given category and score. I can select a row using a cat variable:

df[df['cat'] == cat]

How do I now return the column name that is closest to the score (rounded down)? (c, 3) -> t3

rpanai
  • 12,515
  • 2
  • 42
  • 64
DataCycle
  • 17
  • 4
  • @njzk2 - Not sure it does. I can see how it returns the correct row of a column but not correct column. – DataCycle Jan 02 '22 at 21:21

2 Answers2

1

You can compute the absolute difference to your value and get the index of the minimum with idxmin:

value = 3
cat = 'c'

(df.set_index('cat')
   .loc[cat]
   .sub(value).abs()
   .idxmin()
 )

Output: 't3'

ensuring rounded down

value = 1
cat = 'a'

out = (
 df.set_index('cat')
   .loc[cat]
   .sub(value).abs()
   .idxmin()
 )
x = df.set_index('cat').loc[cat,out]

out = None if value < x else out

print(out)
mozway
  • 194,879
  • 13
  • 39
  • 75
  • The code does work except in edge cases. When I enter value=1 and cat=a I get back t1 when it should be none or empty. I could remedy this by inserting a new 'none' column with 0 values that way it should return none. – DataCycle Jan 02 '22 at 21:54
  • I can confirm adding a new column 'none' with 0 values did the trick. Thank you @mozway. – DataCycle Jan 02 '22 at 21:59
  • @DataCycle sorry I had not seen "rounded down", then you could also check the found **value** and ensure it is not smaller than `value` (see update) – mozway Jan 03 '22 at 00:16
  • Brilliant! Works great. Thank you. – DataCycle Jan 03 '22 at 20:15
0

Try this:

>>> df
  cat  t1  t2  t3  t4
0   a   2   4   6   8
1   b   3   5   7   0
2   c   0   0   1   0

>>> category = 'c'
>>> score = 3
>>> df.set_index('cat').loc[category, f't{score}']
1

>>> category = 'b'
>>> score = 2
>>> df.set_index('cat').loc[category, f't{score}']
5