1

I apologize if this questions has been asked before, it seems quite trivial, but somehow I cannot find the answer.

Let's say that I have a column (selected) that contains the name of another column (option_1, option_2, option_3, option_4):

    id  option_1    option_2    option_3    option_4    selected
0   0   10.0        NaN         NaN         110.0       option_4
1   1   NaN         20.0        200.0       NaN         option_2
2   2   NaN         300.0       30.0        NaN         option_3
3   3   400.0       NaN         NaN         40.0        option_1
4   4   600.0       700.0       50.0        50.0        option_3

So, for example, for the 1st row (id==0), the selected column indicate that the value is stored in option_4 (i.e. 110.0), for 2nd row, it is stored in option_2, etc.

Is there a vectorized way (e.g. without using apply) in which I could obtain a new series (column) with just the values indicated in the selected column?

This seems to be such a common case that I find it difficult to believe that there is no other way than using apply, specially considering that methods such as idxmin and idxmax returns exactly columns in that format.


Example of how to achieve the result using apply

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [0,1,2,3,4], 
    'option_1': [10,     np.nan, np.nan, 400,    600], 
    'option_2': [np.nan, 20,     300,    np.nan, 700], 
    'option_3': [np.nan, 200,    30,     np.nan, 50],
    'option_4': [110,    np.nan, np.nan, 40,     50], 
    'selected': ['option_4','option_2','option_3','option_1','option_3']
})
df['value'] = df.apply(lambda x: x[x['selected']], axis=1)
toto_tico
  • 17,977
  • 9
  • 97
  • 116

1 Answers1

4

Use DataFrame.lookup:

df['value'] = df.lookup(df.index, df['selected'])
print (df)
   id  option_1  option_2  option_3  option_4  selected  value
0   0      10.0       NaN       NaN     110.0  option_4  110.0
1   1       NaN      20.0     200.0       NaN  option_2   20.0
2   2       NaN     300.0      30.0       NaN  option_3   30.0
3   3     400.0       NaN       NaN      40.0  option_1  400.0
4   4     600.0     700.0      50.0      50.0  option_3   50.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @jesrael, I used your answer and noticed that my code was even slower. After some testing, I am finding that `min` is faster than `lookup` (see a [question here](https://stackoverflow.com/questions/51934952/why-is-df-lookup-slower-than-df-min)). Do you have an idea what could I missing? – toto_tico Aug 20 '18 at 16:28
  • @toto_tico Hard question, I am not pandas developer. Maybe problem pandas is more optimalized for a few number of columns and large number od rows. – jezrael Aug 20 '18 at 16:59