I apologize if this questions has been asked before, it seems quite trivial, but somehow I cannot find the answer.
Let's say that I have a column (selected
) that contains the name of another column (option_1
, option_2
, option_3
, option_4
):
id option_1 option_2 option_3 option_4 selected
0 0 10.0 NaN NaN 110.0 option_4
1 1 NaN 20.0 200.0 NaN option_2
2 2 NaN 300.0 30.0 NaN option_3
3 3 400.0 NaN NaN 40.0 option_1
4 4 600.0 700.0 50.0 50.0 option_3
So, for example, for the 1st row (id==0
), the selected
column indicate that the value is stored in option_4
(i.e. 110.0), for 2nd row, it is stored in option_2
, etc.
Is there a vectorized way (e.g. without using apply) in which I could obtain a new series (column) with just the values indicated in the selected
column?
This seems to be such a common case that I find it difficult to believe that there is no other way than using apply, specially considering that methods such as idxmin
and idxmax
returns exactly columns in that format.
Example of how to achieve the result using apply
import pandas as pd
import numpy as np
df = pd.DataFrame({
'id': [0,1,2,3,4],
'option_1': [10, np.nan, np.nan, 400, 600],
'option_2': [np.nan, 20, 300, np.nan, 700],
'option_3': [np.nan, 200, 30, np.nan, 50],
'option_4': [110, np.nan, np.nan, 40, 50],
'selected': ['option_4','option_2','option_3','option_1','option_3']
})
df['value'] = df.apply(lambda x: x[x['selected']], axis=1)