6

I'm having trouble making sense of why a call to pandas' dataframe.apply method is not returning the expected result. Could someone please shed some light on why the first call to apply shown below doesn't return an expected result, while the second one does?

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "x": [1, 2, np.nan],
    "y": ["hi", "there", np.nan]
})
print(df)
#>      x      y
#> 0  1.0     hi
#> 1  2.0  there
#> 2  NaN    NaN
print(df.dtypes)
#> x    float64
#> y     object
#> dtype: object

# why would something like this not return the expected result (which should
# be TRUE, FALSE):
print(df.apply(lambda x: np.issubdtype(x, np.number)))
#> x    False
#> y    False
#> dtype: bool

# but something like this returns the expected result (i.e., median imputation
# is used if the series is a number, otherwise NULLs are replaced with "MISSING"):
def replace_nulls(s):
    is_numeric = np.issubdtype(s, np.number)
    missing_value = s.median() if is_numeric else "MISSING"
    return np.where(s.isnull(), missing_value, s)

print(df.apply(replace_nulls))
#>      x        y
#> 0  1.0       hi
#> 1  2.0    there
#> 2  1.5  MISSING

Created on 2019-10-03 by the reprexpy package

Chris
  • 1,575
  • 13
  • 20
  • 1
    That seems broken to me. `pd.Series({k: np.issubdtype(v, np.number) for k, v in df.items()})` works but your's doesn't. – piRSquared Oct 03 '19 at 15:54
  • Hmm, yeah, not sure why a comprehension like that would work where apply doesn't. – Chris Oct 03 '19 at 16:15
  • `apply` does a lot of checking and things that make it safe. This is a bug and I don't have the patience to unwind it at the moment. The comprehension is exactly what it seems to be and therefore no surprises. Let me know if you want to submit a bug report. Otherwise, I will. – piRSquared Oct 03 '19 at 16:25
  • I'll submit a bug report, thanks. I'm seeing somre more weirdness that may add to the story. – Chris Oct 03 '19 at 16:43
  • 1
    I opened https://github.com/pandas-dev/pandas/issues/28773 – Chris Oct 03 '19 at 17:32
  • As of pandas 1.3.4, this works as OP intended. –  Apr 05 '22 at 17:29
  • Is this thread helpful? https://stackoverflow.com/questions/52436356/pandas-numpy-nan-none-comparison – Vae Jiang Apr 11 '22 at 20:23

0 Answers0