0

I have a matrix with header's and want to remove all rows based on the column, "Closed Date", having "NaN".

Input:

raw_data.ix[~(raw_data['Closed Date'] == "NaN")]

Output:

Closed Date
NaN
NaN
9/28/2017 19:51     
NaN

Why is "NaN" still there?

cool_beans
  • 131
  • 1
  • 5
  • 15
  • try`raw_data.ix[~(raw_data['Closed Date'] == None)]` or `raw_data.ix[~(raw_data['Closed Date'] == np.nan)]` – Sociopath Feb 26 '18 at 18:19
  • Possible duplicate of [How to drop rows of Pandas DataFrame whose value in certain columns is NaN](https://stackoverflow.com/questions/13413590/how-to-drop-rows-of-pandas-dataframe-whose-value-in-certain-columns-is-nan) – DJK Feb 26 '18 at 18:34

2 Answers2

1

NaN is not a string. You need to test for .notnull()

raw_data.ix[~(raw_data['Closed Date'].isnull())]

or

raw_data.ix[raw_data['Closed Date'].notnull()]
Sebastian
  • 1,623
  • 19
  • 23
0

The NaN you are seeing is not a string. It stands for "Not a Number" and is used to represent "Not Available" data in pandas / numpy.

You can remove all rows where Closed Date is NaN via pd.DataFrame.dropna:

raw_data = raw_data.dropna(subset=['Closed Date'], axis=1)
jpp
  • 159,742
  • 34
  • 281
  • 339