0

I'm still a novice in Pandas and cannot seem to combine these few basic steps.

Goal:

I would like to perform an efficient lookup and replace of multiple columns based on a conditional.

I have dataframe df, and need an index lookup from another dataframe lookup if columns lower_limit and upper_limit are both NaN.

I couldn't get merge/join to work because there is a difference between index names (think C_something, F_something from DataFrame lookup), left out for simplicity.

Input:

DataFrames:

import pandas as pd; import numpy as np
df = pd.DataFrame([['A', 3, 5],['B', 2, np.NaN],['C', np.NaN, np.NaN],['D', np.NaN, np.NaN]])
df.columns = ['Name','lower_limit','upper_limit']
df = df.set_index('Name')

lookup = pd.DataFrame([['C_Male', 4, 6],['C_Female', 5, 7],['E_Male', 2, 3],['E_Female', 3, 4]])
lookup.columns = ['Name', 'lower', 'upper']
lookup = lookup.set_index('Name')

# index: Name + index_modifier is the lookup index of interest for example
index_modifier = '_Male'

DataFrames visualized:

# df                                  # lookup
      lower_limit  upper_limit                  lower  upper
Name                                  Name              
A             3.0          5.0        C_Male        4      6
B             2.0          NaN        C_Female      5      7
C             NaN          NaN        E_Male        2      3
D             NaN          NaN        E_Female      3      4

Expected output:

# df
      lower_limit  upper_limit
Name                                     
A             3.0          5.0
B             2.0          NaN  #<-- Does not meet conditional
C             4.0          6.0  #<-- Looked-up with index_modifier and changed
D             NaN          NaN  #<-- Looked-up with index_modifier and left unchanged

Broken Code:

I have tried using df.loc() docs and this answer to mask and set values, but cannot seem to get unique values based on that row's index.

Mask and Set Using df.loc

# error: need get index of each row only
df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull()), ['lower_limit','upper_limit'] ] = lookup.loc[df.index + index_modifier]

Mask with df.loc and Then Set

ix_of_interest = df.loc[(df.lower_limit.isnull()) & (df.upper_limit.isnull())].index

# only keep index values that are in DataFrame 'lookup'
ix_of_interest = [ix for ix in ix_of_interest if ((ix + index_modifier) in lookup.index)]
lookup_ix = [ix + index_modifier for ix in lookup_ix]

# error: Not changing values. I think there is a mismatch of bracket depths for one
df.loc[ix_of_interest, ['lower_limit','upper_limit'] ] = lookup.loc[lookup_ix]

I have also tried to use df.apply() to set the values. See this question.

def do_lookup(row):
    # error:'numpy.float64' object has no attribute 'is_null'
    if row.lower_limit.isnull() and row.upper_limit.isnull():
        if (row.name + index_modifier) in lookup.index:
            return lookup.loc[row.name + index_modifier]

df['lower_limit', 'upper_limit'] = df.apply(do_lookup, axis=1)

or lambda

df['lower_limit', 'upper_limit'] = df.apply(lambda x: lookup.loc[x.name + index_modifier].to_list()
        # isnull() or isnan() would be better
        if ((x.lower_limit == np.NaN) and (x.upper_limit == np.NaN)) 
        # else may not be needed here
        else [np.NaN, np.NaN], 
    axis=1)

This seems like it should be a series of simple steps, but I cannot get them to work correctly. Any insight would be greatly appreciated - my rubber ducky is tired and confused.

rocket_brain
  • 69
  • 1
  • 8

1 Answers1

0

You can use Series.fillna with DataFrame.add_suffix:

index_modifier = '_Male'

init_index=df.index
df=df.T.add_suffix(index_modifier).T
df['lower_limit'].fillna(lookup['lower'],inplace=True)
df['upper_limit'].fillna(lookup['upper'],inplace=True)
df.index=init_index
print(df)


   lower_limit  upper_limit
A          3.0          5.0
B          2.0          NaN
C          4.0          6.0
D          NaN          NaN
ansev
  • 30,322
  • 5
  • 17
  • 31
  • Upon first glance this seems to share the issue with merge. Namely it requires the index to be identical (which I took out for simplicity). If lookup has a modified name, 'C_someModifier', how could I apply fillna? – rocket_brain Sep 30 '19 at 22:43
  • Depending on the modifications of the index, one form or another would be acted upon. It is best that you update your dataframe with an example of the modified index – ansev Sep 30 '19 at 23:03
  • if _something is always the same, perhaps the easiest would be to use df = df.T.add_prefix ('somethin _'). T and then merge or fillna ... – ansev Sep 30 '19 at 23:09
  • I could possibly do an index modifier for the appropriate rows and then un-modify after the lookup. There are multiple layers of complexity though and I'd rather a straightforward use of apply/loc mapping/equivalent if possible. – rocket_brain Sep 30 '19 at 23:27