1

I have no idea why this isn't working... Why am I not able to get rid of these?

I have tried the following:

dfa = dfa[dfa['Date Sold_y'].str.len() < 4] #empty
dfa = dfa[dfa['Date Sold_y'] != ''] #no change
dfa = dfa[dfa['Date Sold_y'] != np.nan] #no change

Dtype is string, and sample values below:

['May-30-2018', nan, nan, 'June-11-2014', 'December-3-2021', nan, 'February-2-2022', nan, nan, 'December-30-2011', nan, nan, nan, nan, nan, nan, nan, nan, 'November-30-2021', nan, 'April-1-2020', nan, 'May-10-2007', nan, nan, nan, nan, nan, nan, 'January-28-2022', nan, nan, nan, 'January-18-2022', nan, nan, nan, 'January-12-2022', nan, 'November-15-2021'
RCarmody
  • 712
  • 1
  • 12
  • 29
  • 1
    None of those would filter nan values. [np.nan != np.nan](https://stackoverflow.com/q/1565164/15497888) is always a true statement by definition of NaN. You probably mean `dfa = dfa[dfa['Date Sold_y'].notna()]` or even `dfa = dfa.dropna(subset=['Date Sold_y'])` – Henry Ecker Feb 05 '22 at 20:28

2 Answers2

1
  1. Maybe nan values are string with extra whitespaces:
>>> dfa[dfa['Date Sold_y'].str.strip() != 'nan']
         Date Sold_y
0        May-30-2018
3       June-11-2014
4    December-3-2021
6    February-2-2022
9   December-30-2011
18  November-30-2021
20      April-1-2020
22       May-10-2007
29   January-28-2022
33   January-18-2022
37   January-12-2022
39  November-15-2021
  1. You can also reverse the logic and keep rows ended by a year:
>>> dfa[dfa['Date Sold_y'].str.contains('\d{4}$')]
  1. Or if it's really nan values, as suggested by @HenryEcker:
>>> dfa[dfa['Date Sold_y'].notna()]

# OR

>>> dfa[~dfa['Date Sold_y'].isna()]
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

By the way if the values are actually nan (and not strings) check out the dropna() method of pandas.DataFrame. It allows to drop rows of the dataframe if one or more nan is found (you can chose) or you can specify a subset of columns to check against nan values

Davide Laghi
  • 116
  • 1
  • 5