I'm having trouble making sense of why a call to pandas' dataframe.apply method is not returning the expected result. Could someone please shed some light on why the first call to apply shown below doesn't return an expected result, while the second one does?
import pandas as pd
import numpy as np
df = pd.DataFrame({
"x": [1, 2, np.nan],
"y": ["hi", "there", np.nan]
})
print(df)
#> x y
#> 0 1.0 hi
#> 1 2.0 there
#> 2 NaN NaN
print(df.dtypes)
#> x float64
#> y object
#> dtype: object
# why would something like this not return the expected result (which should
# be TRUE, FALSE):
print(df.apply(lambda x: np.issubdtype(x, np.number)))
#> x False
#> y False
#> dtype: bool
# but something like this returns the expected result (i.e., median imputation
# is used if the series is a number, otherwise NULLs are replaced with "MISSING"):
def replace_nulls(s):
is_numeric = np.issubdtype(s, np.number)
missing_value = s.median() if is_numeric else "MISSING"
return np.where(s.isnull(), missing_value, s)
print(df.apply(replace_nulls))
#> x y
#> 0 1.0 hi
#> 1 2.0 there
#> 2 1.5 MISSING
Created on 2019-10-03 by the reprexpy package