Nested np.where can replace values based on specified condition. However, as I understand it, df.where (Pandas DataFrame.where) is native to pandas.
Unlike np.where, the df value is returned when the condition is true and the specified alternate when false.
''' #snippet of data: df_age
0 1
0 20 3
1 23 4
2 26 5
3 29 2
4 NaN 1
5 NaN 2
6 NaN 3
7 NaN 0
'''
## define function to check NaN/null age and assign age
def impute_age(cols):
age = cols[0] # assign first column to age Series
#print(f'age: {age}') #debug
pclass = cols.loc[:,1] # assign 2nd to pclass
'''
## Nested np.where that works as expected
age_null = np.where((cols[0].isnull()) & (cols[1]==1), 37,
np.where((cols[0].isnull()) & (cols[1]==2), 39,
np.where((cols[0].isnull()) & (np.logical_or(cols[1] != 1, cols[1] != 2)), 24,
cols[0]))).astype(int)
'''
## nested pd.where
age_null = age.where((cols[0].notnull()) & ... ... ... )
print(f'age col[0]: \n{age_null}')
impute_age(df_age)
Q1: What is the more pythonic way for the working nested np.where
Q2: [Main question] How do I write the df.where (age.where) to achieve the same as the np.where
NB: I'll be testing out using mask at a later stage.
NB: I used nested np.where to replace nested if ...
NB: I'm trying out pd.where, and later mask, map, substitute for demonstration purposes and, at the same time, how best to write them (pythonically)
NB: I took note of the following:
SO: df.where
SO: truth value
np.where
pd.where
SO: pd.where col change
SO: pd.where nested loop
SO: pd.where covert values
SO: pd.where elif logic
SO: pd.where nested?