How to check if any value is NaN in a Pandas DataFrame

Question

In Python Pandas, what's the best way to check whether a DataFrame has one (or more) NaN values?

I know about the function pd.isnan, but this returns a DataFrame of booleans for each element. This post right here doesn't exactly answer my question either.

check out [summary of the counts of missing data in pandas](http://stackoverflow.com/questions/22257527/how-do-i-get-a-summary-of-the-counts-of-missing-data-in-pandas) — LinkBerest, Apr 09 '15 at 05:16
Best answer : https://stackoverflow.com/questions/22257527/how-do-i-get-a-summary-count-of-missing-nan-data-by-column-in-pandas/75632616#75632616 — Jaya Raghavendra, Mar 03 '23 at 23:15

score 826 · Accepted Answer · edited Aug 31 '20 at 13:30

826

jwilner's response is spot on. I was exploring to see if there's a faster option, since in my experience, summing flat arrays is (strangely) faster than counting. This code seems faster:

df.isnull().values.any()

import numpy as np
import pandas as pd
import perfplot


def setup(n):
    df = pd.DataFrame(np.random.randn(n))
    df[df > 0.9] = np.nan
    return df


def isnull_any(df):
    return df.isnull().any()


def isnull_values_sum(df):
    return df.isnull().values.sum() > 0


def isnull_sum(df):
    return df.isnull().sum() > 0


def isnull_values_any(df):
    return df.isnull().values.any()


perfplot.save(
    "out.png",
    setup=setup,
    kernels=[isnull_any, isnull_values_sum, isnull_sum, isnull_values_any],
    n_range=[2 ** k for k in range(25)],
)

df.isnull().sum().sum() is a bit slower, but of course, has additional information -- the number of NaNs.

edited Aug 31 '20 at 13:30

Nico Schlömer

53,797
27
201
249

answered Apr 09 '15 at 05:39

S Anand

11,364
2
28
23

1

Thank you for the time benchmarks. It's surprising that `pandas` doesn't have a built in function for this. It's true from @JGreenwell's post that `df.describe()` can do this, but no direct function. – hlin117 Apr 09 '15 at 06:37
2

I just timed `df.describe()` (without finding `NaN`s). With a 1000 x 1000 array, a single call takes 1.15 seconds. – hlin117 Apr 09 '15 at 06:43
3

:1, Also, `df.isnull().values.sum()` is a bit faster than `df.isnull().values.flatten().sum()` – Zero Apr 12 '15 at 21:02
Ah, good catch @JohnGalt -- I'll change my solution to remove the `.flatten()` for postering. Thanks. – S Anand Apr 13 '15 at 01:25
9

You didn't try `df.isnull().values.any()`, for me it is faster than the others. – CK1 Jul 15 '15 at 15:28
I agree with @CK1. For me `df.isnull().values.any()` is twice as fast (0.7 ms) than `df.isnull().values.sum()` (1.4 ms) – Jack Kelly Aug 31 '15 at 11:01
1

`np.isnan(df.values).any()` works a bit faster, but it doesn't work for object dtype – Eugene Pakhomov Jan 22 '17 at 19:09
`df.shape[1] - df.dropna(axis = 1).shape[1]` would quickly confirm how many columns have null values in entire dataframe – Nim J Feb 07 '18 at 05:25
This also works with a single column, e.g., `df['col1'].isnull().values.any()` – Josiah Yoder Jul 28 '20 at 19:53
I'm surprised that no one has mentioned that `isnull_any`'s implementation is wrong. It is returning a Series object, not boolean. One has to return `df.isnull().any().any()` instead of `df.isnull().any()` to get a boolean. – AXO Aug 24 '23 at 06:33

score 234 · Answer 2 · edited Jan 29 '18 at 12:26

You have a couple of options.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,6))
# Make a few areas have NaN values
df.iloc[1:3,1] = np.nan
df.iloc[5,3] = np.nan
df.iloc[7:9,5] = np.nan

Now the data frame looks something like this:

          0         1         2         3         4         5
0  0.520113  0.884000  1.260966 -0.236597  0.312972 -0.196281
1 -0.837552       NaN  0.143017  0.862355  0.346550  0.842952
2 -0.452595       NaN -0.420790  0.456215  1.203459  0.527425
3  0.317503 -0.917042  1.780938 -1.584102  0.432745  0.389797
4 -0.722852  1.704820 -0.113821 -1.466458  0.083002  0.011722
5 -0.622851 -0.251935 -1.498837       NaN  1.098323  0.273814
6  0.329585  0.075312 -0.690209 -3.807924  0.489317 -0.841368
7 -1.123433 -1.187496  1.868894 -2.046456 -0.949718       NaN
8  1.133880 -0.110447  0.050385 -1.158387  0.188222       NaN
9 -0.513741  1.196259  0.704537  0.982395 -0.585040 -1.693810

Option 1: df.isnull().any().any() - This returns a boolean value

You know of the isnull() which would return a dataframe like this:

       0      1      2      3      4      5
0  False  False  False  False  False  False
1  False   True  False  False  False  False
2  False   True  False  False  False  False
3  False  False  False  False  False  False
4  False  False  False  False  False  False
5  False  False  False   True  False  False
6  False  False  False  False  False  False
7  False  False  False  False  False   True
8  False  False  False  False  False   True
9  False  False  False  False  False  False

If you make it df.isnull().any(), you can find just the columns that have NaN values:

0    False
1     True
2    False
3     True
4    False
5     True
dtype: bool

One more .any() will tell you if any of the above are True

> df.isnull().any().any()
True

Option 2: df.isnull().sum().sum() - This returns an integer of the total number of NaN values:

This operates the same way as the .any().any() does, by first giving a summation of the number of NaN values in a column, then the summation of those values:

df.isnull().sum()
0    0
1    2
2    0
3    1
4    0
5    2
dtype: int64

Finally, to get the total number of NaN values in the DataFrame:

df.isnull().sum().sum()
5

Why not using `.any(axis=None)` instead of `.any().any()`? – Georgy Jan 13 '20 at 18:59 — Georgy, Jan 13 '20 at 18:59

score 105 · Answer 3 · edited Nov 19 '17 at 13:23

105

To find out which rows have NaNs in a specific column:

nan_rows = df[df['name column'].isnull()]

edited Nov 19 '17 at 13:23

Håken Lid

22,318
9
52
67

answered Nov 19 '17 at 13:13

Ihor Ivasiuk

1,155
1
7
2

22

To find out which rows do not have NaNs in a specific column: `non_nan_rows = df[df['name column'].notnull()]`. – Elmex80s Nov 27 '17 at 10:00

hobs · Answer 4 · 2020-09-26T20:33:55.103

67

If you need to know how many rows there are with "one or more NaNs":

df.isnull().T.any().T.sum()

Or if you need to pull out these rows and examine them:

nan_rows = df[df.isnull().T.any()]

edited Sep 26 '20 at 20:33

answered May 25 '16 at 16:17

hobs

18,473
10
83
106

what is `T` here ? – WestCoastProjects Sep 23 '22 at 09:27
alias for `.transpose()` – hobs Sep 30 '22 at 23:38

score 57 · Answer 5 · answered Apr 09 '15 at 05:16

57

df.isnull().any().any() should do it.

answered Apr 09 '15 at 05:16

jwilner

6,348
6
35
47

cs95 · Answer 6 · 2019-05-22T06:47:57.450

Super Simple Syntax: `df.isna().any(axis=None)`

Starting from v0.23.2, you can use DataFrame.isna + DataFrame.any(axis=None) where axis=None specifies logical reduction over the entire DataFrame.

# Setup
df = pd.DataFrame({'A': [1, 2, np.nan], 'B' : [np.nan, 4, 5]})
df
     A    B
0  1.0  NaN
1  2.0  4.0
2  NaN  5.0

df.isna()

       A      B
0  False   True
1  False  False
2   True  False

df.isna().any(axis=None)
# True

Useful Alternatives

numpy.isnan
Another performant option if you're running older versions of pandas.

np.isnan(df.values)

array([[False,  True],
       [False, False],
       [ True, False]])

np.isnan(df.values).any()
# True

Alternatively, check the sum:

np.isnan(df.values).sum()
# 2

np.isnan(df.values).sum() > 0
# True

Series.hasnans
You can also iteratively call Series.hasnans. For example, to check if a single column has NaNs,

df['A'].hasnans
# True

And to check if any column has NaNs, you can use a comprehension with any (which is a short-circuiting operation).

any(df[c].hasnans for c in df)
# True

This is actually very fast.

This might not be the fastest option but it is the most readable one in 2022 :) — Joe, Oct 18 '22 at 09:18

score 24 · Answer 7 · edited Aug 23 '17 at 01:48

24

Adding to Hobs brilliant answer, I am very new to Python and Pandas so please point out if I am wrong.

To find out which rows have NaNs:

nan_rows = df[df.isnull().any(1)]

would perform the same operation without the need for transposing by specifying the axis of any() as 1 to check if 'True' is present in rows.

edited Aug 23 '17 at 01:48

answered Aug 23 '17 at 01:22

Ankit

341
2
4

This gets rid of **two** transposes! Love your concise `any(axis=1)` simplification. – hobs Sep 09 '18 at 22:22

Naveen Reddy Marthala · Answer 8 · 2021-01-14T15:52:50.830

let df be the name of the Pandas DataFrame and any value that is numpy.nan is a null value.

If you want to see which columns has nulls and which do not(just True and False)
```
df.isnull().any()
```
If you want to see only the columns that has nulls
```
df.loc[:, df.isnull().any()].columns
```
If you want to see the count of nulls in every column
```
df.isna().sum()
```
If you want to see the percentage of nulls in every column
```
df.isna().sum()/(len(df))*100
```
If you want to see the percentage of nulls in columns only with nulls:

df.loc[:,list(df.loc[:,df.isnull().any()].columns)].isnull().sum()/(len(df))*100

EDIT 1:

If you want to see where your data is missing visually:

import missingno
missingdata_df = df.columns[df.isnull().any()].tolist()
missingno.matrix(df[missingdata_df])

_If you want to see the count of nulls in every column..._ That seems insane, why not just do `df.isna().sum()` ? — AMC, Feb 16 '20 at 04:09

chmodsss · Answer 9 · 2017-10-18T12:28:20.483

11

Since none have mentioned, there is just another variable called hasnans.

df[i].hasnans will output to True if one or more of the values in the pandas Series is NaN, False if not. Note that its not a function.

pandas version '0.19.2' and '0.20.2'

edited Oct 18 '17 at 12:28

answered May 05 '17 at 14:17

chmodsss

711
8
18

6

This answer is incorrect. Pandas Series have this attribute but DataFrames do not. If `df = DataFrame([1,None], columns=['foo'])`, then `df.hasnans` will throw an `AttributeError`, but `df.foo.hasnans` will return `True`. – Nathan Thompson Oct 11 '17 at 22:27

score 8 · Answer 10 · answered Jun 16 '16 at 05:06

Since pandas has to find this out for DataFrame.dropna(), I took a look to see how they implement it and discovered that they made use of DataFrame.count(), which counts all non-null values in the DataFrame. Cf. pandas source code. I haven't benchmarked this technique, but I figure the authors of the library are likely to have made a wise choice for how to do it.

score 8 · Answer 11 · answered May 08 '19 at 09:29

8

I've been using the following and type casting it to a string and checking for the nan value

   (str(df.at[index, 'column']) == 'nan')

This allows me to check specific value in a series and not just return if this is contained somewhere within the series.

answered May 08 '19 at 09:29

Peter Thomas

81
1
2

1

Is there any advantage to using this over `pandas.isna()` ? – AMC Feb 16 '20 at 04:10
This allows checking a single field. – Álvaro Jul 08 '21 at 16:50

Adarsh singh · Answer 12 · 2020-02-21T05:37:52.787

7

df.isnull().sum()

This will give you count of all NaN values present in the respective coloums of the DataFrame.

edited Feb 21 '20 at 05:37

answered Jul 07 '19 at 18:29

Adarsh singh

137
1
11

No, that will give you a Series which maps column names to their respective number of NA values. – AMC Feb 16 '20 at 04:11
Corrected, my fault :p – Adarsh singh Feb 21 '20 at 05:39

score 6 · Answer 13 · edited Oct 06 '21 at 05:53

6

try the following

df.isnull().sum()

or

df.isna().values.any()

edited Oct 06 '21 at 05:53

Suraj Rao

29,388
11
94
103

answered Oct 06 '21 at 05:50

Mohamed Othman

79
1
3

score 5 · Answer 14 · answered Nov 02 '17 at 03:06

5

Just using math.isnan(x), Return True if x is a NaN (not a number), and False otherwise.

answered Nov 02 '17 at 03:06

frankchen0130

559
6
7

4

I don't think `math.isnan(x)` is going to work when `x` is a DataFrame. You get a TypeError instead. – hlin117 Nov 04 '17 at 19:56
Why would you use this over any of the alternatives? – AMC Feb 16 '20 at 04:05

score 4 · Answer 15 · answered Aug 27 '18 at 16:11

Here is another interesting way of finding null and replacing with a calculated value

    #Creating the DataFrame

    testdf = pd.DataFrame({'Tenure':[1,2,3,4,5],'Monthly':[10,20,30,40,50],'Yearly':[10,40,np.nan,np.nan,250]})
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3     NaN
    3       40       4     NaN
    4       50       5   250.0

    #Identifying the rows with empty columns
    nan_rows = testdf2[testdf2['Yearly'].isnull()]
    >>> nan_rows
       Monthly  Tenure  Yearly
    2       30       3     NaN
    3       40       4     NaN

    #Getting the rows# into a list
    >>> index = list(nan_rows.index)
    >>> index
    [2, 3]

    # Replacing null values with calculated value
    >>> for i in index:
        testdf2['Yearly'][i] = testdf2['Monthly'][i] * testdf2['Tenure'][i]
    >>> testdf2
       Monthly  Tenure  Yearly
    0       10       1    10.0
    1       20       2    40.0
    2       30       3    90.0
    3       40       4   160.0
    4       50       5   250.0

score 4 · Answer 16 · answered May 09 '20 at 02:53

4

We can see the null values present in the dataset by generating heatmap using seaborn moduleheatmap

import pandas as pd
import seaborn as sns
dataset=pd.read_csv('train.csv')
sns.heatmap(dataset.isnull(),cbar=False)

answered May 09 '20 at 02:53

Aditya

340
2
10

score 3 · Answer 17 · answered Jun 03 '19 at 11:00

3

The best would be to use:

df.isna().any().any()

Here is why. So isna() is used to define isnull(), but both of these are identical of course.

This is even faster than the accepted answer and covers all 2D panda arrays.

answered Jun 03 '19 at 11:00

prosti

42,291
14
186
151

score 3 · Answer 18 · answered Mar 13 '21 at 09:10

3

To do this we can use the statement df.isna().any() . This will check all of our columns and return True if there are any missing values or NaNs, or False if there are no missing values.

answered Mar 13 '21 at 09:10

Pobaranchuk

839
9
13

score 3 · Answer 19 · answered Jan 18 '22 at 20:55

I recommend to use values attribute as evaluation on array is much faster.

arr = np.random.randn(100, 100)
arr[40, 40] = np.nan
df = pd.DataFrame(arr)

%timeit np.isnan(df.values).any()  # 7.56 µs
%timeit np.isnan(df).any()         # 627 µs
%timeit df.isna().any(axis=None)   # 572 µs

Result:

7.56 µs ± 447 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
627 µs ± 40.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
572 µs ± 15.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Note: You need to run %timeit in Jupyter notebook to work

score 3 · Answer 20 · answered Sep 24 '22 at 20:20

3

This will only include columns with at least 1 null/na value.

 df.isnull().sum()[df.isnull().sum()>0]

answered Sep 24 '22 at 20:20

Brndn

676
1
7
21

score 2 · Answer 21 · answered Jun 26 '18 at 11:30

Or you can use .info() on the DF such as :

df.info(null_counts=True) which returns the number of non_null rows in a columns such as:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3276314 entries, 0 to 3276313
Data columns (total 10 columns):
n_matches                          3276314 non-null int64
avg_pic_distance                   3276314 non-null float64

score 2 · Answer 22 · answered Aug 09 '19 at 13:24

2

import missingno as msno
msno.matrix(df)  # just to visualize. no missing value.

answered Aug 09 '19 at 13:24

Ikbel

1,817
1
17
30

score 2 · Answer 23 · answered Oct 18 '21 at 15:06

2

Another way is to dropna and check if the lengths are equivalent:

>>> len(df.dropna()) != len(df)
True
>>>

answered Oct 18 '21 at 15:06

U13-Forward

69,221
14
89
114

score 1 · Answer 24 · answered Dec 24 '18 at 15:29

1

df.apply(axis=0, func=lambda x : any(pd.isnull(x)))

Will check for each column if it contains Nan or not.

answered Dec 24 '18 at 15:29

Alex Dlikman

51
5

Why use this over any of the builtin solutions? – AMC Feb 16 '20 at 04:12

score 0 · Answer 25 · edited Feb 04 '20 at 22:06

0

You could not only check if any 'NaN' exist but also get the percentage of 'NaN's in each column using the following,

df = pd.DataFrame({'col1':[1,2,3,4,5],'col2':[6,np.nan,8,9,10]})  
df  

   col1 col2  
0   1   6.0  
1   2   NaN  
2   3   8.0  
3   4   9.0  
4   5   10.0  


df.isnull().sum()/len(df)  
col1    0.0  
col2    0.2  
dtype: float64

edited Feb 04 '20 at 22:06

eyllanesc

235,170
19
170
241

answered Feb 04 '20 at 21:50

Nizam

340
1
6
11

score 0 · Answer 26 · answered Jan 22 '22 at 09:34

0

Bar representation for missing values

import missingno
missingno.bar(df)# will give you exact no of values and values missing

answered Jan 22 '22 at 09:34

FAISAL BARGI

30
6

score 0 · Answer 27 · answered Mar 03 '23 at 23:17

0

This is code makes your life easy

import sidetable

df.stb.missing()

Check this out : https://github.com/chris1610/sidetable

answered Mar 03 '23 at 23:17

Jaya Raghavendra

1,211
1
8
9

score -1 · Answer 28 · answered Mar 24 '16 at 02:44

-1

Depending on the type of data you're dealing with, you could also just get the value counts of each column while performing your EDA by setting dropna to False.

for col in df:
   print df[col].value_counts(dropna=False)

Works well for categorical variables, not so much when you have many unique values.

answered Mar 24 '16 at 02:44

unique_beast

1,379
2
11
23

I think this is inefficient. Built-in functions of pandas are more neat/terse. Avoids cluttering of the ipython notebook. – Koo Apr 10 '19 at 17:15

How to check if any value is NaN in a Pandas DataFrame

28 Answers28

Super Simple Syntax: `df.isna().any(axis=None)`

Useful Alternatives

Linked

Related

How to check if any value is NaN in a Pandas DataFrame

28 Answers28

Super Simple Syntax: df.isna().any(axis=None)

Useful Alternatives

Linked

Related

Super Simple Syntax: `df.isna().any(axis=None)`