removing NA values from a DataFrame in Python 3.4

Question

import pandas as pd
import statistics

df=print(pd.read_csv('001.csv',keep_default_na=False, na_values=[""]))
print(df)

I am using this code to create a data frame which has no NA values. I have couple of CSV files and I want to calculate Mean of one of the columns - sulfate. This column has many 'NA' values, which I am trying to exclude. Even after using the above code, 'NA's aren't excluded from the data frame. Please suggest.

score 2 · Answer 1 · edited Dec 07 '17 at 01:29

2

I think you should import the .csv file as it is and then manipulate the data frame. Then, you can use any of the methods below.

foo[foo.notnull()]

or

foo.dropna()

edited Dec 07 '17 at 01:29

Pang

9,564
146
81
122

answered Mar 17 '15 at 05:04

VoidZero

480
2
6
23

thanks for replying but i get this error: AttributeError: 'NoneType' object has no attribute 'notnull' – Brilliant Mar 18 '15 at 02:16

score 0 · Answer 2 · edited May 23 '17 at 12:01

0

Method 1 :

 df[['A','C']].apply(lambda x: my_func(x) if(np.all(pd.notnull(x[1]))) else x, axis = 1)

Use pandas notnull

Method 2 :

df = df[np.isfinite(df['EPS'])]

Method 3 : Using dropna Here

In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.ix[::2,0] = np.nan; df.ix[::4,1] = np.nan; df.ix[::3,2] = np.nan;

In [26]: df
Out[26]:
          0         1         2
0       NaN       NaN       NaN
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values
Out[27]:
          0         1         2
1  2.677677 -1.466923 -0.750366
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295

edited May 23 '17 at 12:01

Community

1
1

answered Mar 17 '15 at 04:50

backtrack

7,996
5
52
99

thanks for the reply but i keep on getting this error on using all of the above methods - 'df.dropna(subset = ['sulfate'])' AttributeError: 'NoneType' object has no attribute 'dropna' I tried to replace NA with '0' so that I can directly get a mean -'clean = df.replace('NA', 0)' – Brilliant Mar 18 '15 at 00:58
Still I get this Attribute Error : AttributeError: 'NoneType' object has no attribute 'replace' – Brilliant Mar 18 '15 at 01:04
I can't attach here, but I am providing a sample: Date sulfate nitrate ID * 1/1/2003 NA NA 1 * 1/2/2003 NA NA 1 * 1/3/2003 NA NA 1 * 1/4/2003 NA NA 1 * 1/5/2003 NA NA 1 – Brilliant Mar 20 '15 at 13:13

score 0 · Answer 3 · edited Oct 12 '18 at 16:36

0

I got the same error until I added axis=0 and how='any'.

df=df.dropna(axis=0, how='any')

edited Oct 12 '18 at 16:36

Yuca

6,010
3
22
42

answered Oct 12 '18 at 15:12

Nicolette Ige

1

score 0 · Answer 4 · answered Sep 24 '19 at 19:49

columsMissng=[]
for i in columns:
   c=df.loc[df[i] == '?', i].count();
   columsMissng.append((i,c));
c=0
dropcolumsMissng=[]
for i in columsMissng:
    if i[1]>20000:
        count=count+1;
        dropcolumsMissng.append(i[0])
newDF=df.drop(columns=dropcolumsMissng)

In place of '?' you can put any value you want to count and if i[1]>20000: you can put your threshold like 50% of data or anything you want.

In case you want to remove 'NaN'

c=newDF.columns.values
dropcolumsMissng=[]

for i in columns:
    num_nans = len(newDF) - newDF[i].count()
    if num_nans>20000:
        dropcolumsMissng.append(i)
newDF=newDF.drop(columns=dropcolumsMissng)

removing NA values from a DataFrame in Python 3.4

4 Answers4

Linked