I'm still a novice in Pandas and cannot seem to combine these few basic steps.
Goal:
I would like to perform an efficient lookup and replace of multiple columns based on a conditional.
I have dataframe df, and need an index lookup from another dataframe lookup if columns lower_limit and upper_limit are both NaN.
I couldn't get merge/join to work because there is a difference between index names (think C_something, F_something from DataFrame lookup), left out for simplicity.
Input:
DataFrames:
import pandas as pd; import numpy as np
df = pd.DataFrame([['A', 3, 5],['B', 2, np.NaN],['C', np.NaN, np.NaN],['D', np.NaN, np.NaN]])
df.columns = ['Name','lower_limit','upper_limit']
df = df.set_index('Name')
lookup = pd.DataFrame([['C_Male', 4, 6],['C_Female', 5, 7],['E_Male', 2, 3],['E_Female', 3, 4]])
lookup.columns = ['Name', 'lower', 'upper']
lookup = lookup.set_index('Name')
# index: Name + index_modifier is the lookup index of interest for example
index_modifier = '_Male'
DataFrames visualized:
# df # lookup
lower_limit upper_limit lower upper
Name Name
A 3.0 5.0 C_Male 4 6
B 2.0 NaN C_Female 5 7
C NaN NaN E_Male 2 3
D NaN NaN E_Female 3 4
Expected output:
# df
lower_limit upper_limit
Name
A 3.0 5.0
B 2.0 NaN #<-- Does not meet conditional
C 4.0 6.0 #<-- Looked-up with index_modifier and changed
D NaN NaN #<-- Looked-up with index_modifier and left unchanged
Broken Code:
I have tried using df.loc() docs and this answer to mask and set values, but cannot seem to get unique values based on that row's index.
Mask and Set Using df.loc
# error: need get index of each row only
df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull()), ['lower_limit','upper_limit'] ] = lookup.loc[df.index + index_modifier]
Mask with df.loc and Then Set
ix_of_interest = df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull())].index
# only keep index values that are in DataFrame 'lookup'
ix_of_interest = [ix for ix in ix_of_interest if ((ix + index_modifier) in lookup.index)]
lookup_ix = [ix + index_modifier for ix in lookup_ix]
# error: Not changing values. I think there is a mismatch of bracket depths for one
df.loc[ix_of_interest, ['lower_limit','upper_limit'] ] = lookup.loc[lookup_ix]
I have also tried to use df.apply() to set the values. See this question.
def do_lookup(row):
# error:'numpy.float64' object has no attribute 'is_null'
if row.lower_limit.isnull() and row.upper_limit.isnull():
if (row.name + index_modifier) in lookup.index:
return lookup.loc[row.name + index_modifier]
df['lower_limit', 'upper_limit'] = df.apply(do_lookup, axis=1)
or lambda
df['lower_limit', 'upper_limit'] = df.apply(lambda x: lookup.loc[x.name + index_modifier].to_list()
# isnull() or isnan() would be better
if ((x.lower_limit == np.NaN) and (x.upper_limit == np.NaN))
# else may not be needed here
else [np.NaN, np.NaN],
axis=1)
This seems like it should be a series of simple steps, but I cannot get them to work correctly. Any insight would be greatly appreciated - my rubber ducky is tired and confused.