I have the following data frame and list values
df_merge = pd.DataFrame({'column1': [0.5, 0.4, 0.9, 0.7],
'column2': [0.7, 0.8, 0.2, 0.38],
'column3': [0.6, 0.8, 0.3, 0.67],
'column4': [0.1, 0.35, 0.55, 0.6],
'group': ['1ab', '2ab', '3ab', '4ab'],
'line': ['cc', 'gg', 'nn','pp'],
'column5': ['-1', '-1', '0','0']})
list_0 = ['aa', 'bb', 'cc', 'dd', 'ee', 'ff']
list_1 = ['gg', 'hh', 'ii', 'jj', 'kk']
list_2 = ['ll', 'mm', 'nn']
list_3 = ['oo', 'pp']
Im trying to apply search function in the condition variable which will then be used in np.where function.
where df_merge['line'] can be any value from the list's above.
I tried the below, however not sure if that is the right approach and got an error " TypeError: unhashable type: 'list' "
This error resolved by using df_merge['line'].isin(list_0)
for list
condition = [(df_merge['group'] == '1ab') & (df_merge['line'] == df_merge['line'].isin(list_0)),
(df_merge['group'] == '2ab') & (df_merge['line'] == df_merge['line'].isin(list_1)),
(df_merge['group'] == '3ab') & (df_merge['line'] == df_merge['line'].isin(list_2)),
(df_merge['group'] == '4ab') & (df_merge['line'] == df_merge['line'].isin(list_3))]
After the above condition, i need to run the rest of the code
choices = [1 - (np.where(df_merge['column1'] >= 0.6, 0, 1) + np.where(df_merge['column2'] >= 0.6, 0, 1) + np.where(
df_merge['column3'] >= 0.6, 0, 1) + np.where(df_merge['column4'] >= 0.6, 0, 1)), 1 - (
np.where(df_merge['column1'] >= 0.6, 0, 1) + np.where(df_merge['column2'] >= 0.6, 0,
1) + np.where(df_merge['column3'] >= 0.6,
0, 1)), 1 - (
np.where(df_merge['column1'] >= 0.6, 0, 1) + np.where(df_merge['column2'] >= 0.6, 0,
1) + np.where(df_merge['column4'] >= 0.6,
0, 1)),
1 - (np.where(df_merge['column1'] >= 0.6, 0, 1) + np.where(df_merge['column2'] >= 0.6, 0, 1))]
df_merge['column5'] = np.select(condition, choices, default = 1- (np.where(df_merge['column1'] >= 0.6, 0, 1) + np.where(df_merge['column2'] >= 0.6, 0, 1)))
Im am not sure if we can use np.where in choices as mentioned above. For which i got an error TypeError: '>=' not supported between instances of 'str' and 'float'
. Error solved by converting string values in column1 - column4 to numerical.
The expected output will be:
df_merge = pd.DataFrame({'column1': ['0.5', '0.4', '0.9', '0.7'],
'column2': ['0.7', '0.8', '0.2', '0.38'],
'column3': ['0.6', '0.8', '0.3', '0.67'],
'column4': ['0.1', '0.35', '0.55', '0.6'],
'group': ['1ab', '2ab', '3ab', '4ab'],
'line': ['cc', 'gg', 'nn','pp'],
'column5': ['-1', '-1', '0','0']})
Any help / guidance is much appreciated.