1

I need to rid myself of all rows with a null value in column C. Here is the code:

infile="C:\****"

df=pd.read_csv(infile)    

A   B   C   D
1   1   NaN 3
2   3   7   NaN
4   5   NaN 8
5   NaN 4   9
NaN 1   2   NaN

There are two basic methods I have attempted.

method 1: source: How to drop rows of Pandas DataFrame whose value in certain columns is NaN

df.dropna()

The result is an empty dataframe, which makes sense because there is an NaN value in every row.

df.dropna(subset=[3])

For this method I tried to play around with the subset value using both column index number and column name. The dataframe is still empty.

method 2: source: Deleting DataFrame row in Pandas based on column value

df = df[df.C.notnull()]

Still results in an empty dataframe!

What am I doing wrong?

Community
  • 1
  • 1
geolish
  • 33
  • 4

2 Answers2

2
df = pd.DataFrame([[1,1,np.nan,3],[2,3,7,np.nan],[4,5,np.nan,8],[5,np.nan,4,9],[np.nan,1,2,np.nan]], columns = ['A','B','C','D'])
df = df[df['C'].notnull()]
df
flyingmeatball
  • 7,457
  • 7
  • 44
  • 62
  • @EdChum He didn't like notnull() up above, so I gave him some variety :) – flyingmeatball Apr 25 '16 at 19:13
  • It looks to me the OP got an empty dataframe due to the first incorrect operation – EdChum Apr 25 '16 at 19:19
  • While this code may answer the question, providing additional context regarding why and/or how it answers the question would significantly improve its long-term value. Please [edit] your answer to add some explanation. – CodeMouse92 Apr 25 '16 at 20:28
0

It's just a prove that your method 2 works properly (at least with pandas 0.18.0):

In [100]: df
Out[100]:
     A    B    C    D
0  1.0  1.0  NaN  3.0
1  2.0  3.0  7.0  NaN
2  4.0  5.0  NaN  8.0
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

In [101]: df.dropna(subset=['C'])
Out[101]:
     A    B    C    D
1  2.0  3.0  7.0  NaN
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

In [102]: df[df.C.notnull()]
Out[102]:
     A    B    C    D
1  2.0  3.0  7.0  NaN
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN

In [103]: df = df[df.C.notnull()]

In [104]: df
Out[104]:
     A    B    C    D
1  2.0  3.0  7.0  NaN
3  5.0  NaN  4.0  9.0
4  NaN  1.0  2.0  NaN
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419