Find empty or NaN entry in Pandas Dataframe

Question

I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

Here is a dataframe that I am working with:

cl_id       a           c         d         e        A1              A2             A3
    0       1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042
    1       2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259
    2       3   -1.259333  1.074986  1.834653    system                       1.100353
    3       4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273
    4       5    0.578348  0.595515  0.553483   channel        0.640708       0.649132
    5       6   -1.549588 -0.198588  0.373476     audio       -0.508501               
    6       7    0.172863  1.874987  1.405923    Twenty             NaN            NaN
    7       8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?

Are the blank entries empty strings? Or are they strings containing whitespace...? — unutbu, Nov 26 '14 at 21:43
Added to the original post. The blank entries are just empty strings. — edesz, Nov 26 '14 at 22:26

unutbu · Accepted Answer · 2014-11-26T21:42:22.863

81

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

edited Nov 26 '14 at 21:42

answered Nov 26 '14 at 21:33

unutbu

842,883
184
1,785
1,677

For the blank/missing entries (applymap), is there a way to put this in a list? ex.: is there a way to extract a list as [2,5], corresponding to index 2 and index 5? – edesz Nov 26 '14 at 22:25
2

You could make a list of "coordinates" with `zip(np.where(df.applymap(lambda x: x == '')))` – unutbu Nov 26 '14 at 23:06
3

The suggestion in this answer is what I have used: `df = df.replace('', np.nan)` to [replace the blank strings by `NaN`](https://www.quora.com/How-do-I-replace-all-blank-empty-cells-in-a-pandas-dataframe-with-NaNs) and then `df.loc[df.isna().any(axis=1)]` to get the output `DataFrame`. By doing this, as suggested by @unutbu, there is no need for [the slow `.apply()`](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code/54432584#54432584) or `.applymap()`. – edesz Jun 17 '19 at 23:03
Adding more details to the above answer, you can get column numbers with null values by `print(set((np.where(pd.isnull(train_df)))[1]))` and print the column names using `df.columns[]` – Rushikesh Gaidhani Jan 22 '20 at 15:04

score 58 · Answer 2 · answered Oct 16 '18 at 20:37

58

Try this:

df[df['column_name'] == ''].index

and for NaNs you can try:

pd.isna(df['column_name'])

answered Oct 16 '18 at 20:37

Vyachez

941
8
15

score 23 · Answer 3 · answered Jan 03 '20 at 07:25

23

Check if the columns contain Nan using .isnull() and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.

Sum along axis 0 to find columns with missing data, then sum along axis 1 to the index locations for rows with missing data.

missing_cols, missing_rows = (
    (df2.isnull().sum(x) | df2.eq('').sum(x))
    .loc[lambda x: x.gt(0)].index
    for x in (0, 1)
)

>>> df2.loc[missing_rows, missing_cols]
         A2       A3
2            1.10035
5 -0.508501         
6       NaN      NaN
7       NaN      NaN

answered Jan 03 '20 at 07:25

Alexander

105,104
32
201
196

3

This should be a new accepted answer as it gives the best overview of the missing values. – Kokokoko Aug 15 '21 at 14:31
`.eq('')` also works inside `.query()`. So you can find them by `df.query('column_name.eq("")', engine='python')` – mrdaliri Apr 06 '22 at 01:42

score 13 · Answer 4 · answered Jun 03 '19 at 11:55

13

I've resorted to

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.

answered Jun 03 '19 at 11:55

jeremy_rutman

3,552
4
28
47

What does the `u` string prefix do? – dumbledad May 28 '20 at 10:39
6

many eons ago, in the age of python2.7, strings were not unicode by default, so to create a unicode string literal you had to prefix it with that `u` – jeremy_rutman May 28 '20 at 10:43

score 9 · Answer 5 · answered Jun 30 '21 at 20:46

In my opinion, don't waste time and just replace with NaN! Then, search all entries with Na. (This is correct because empty values are missing values anyway).

import numpy as np                             # to use np.nan 
import pandas as pd                            # to use replace
    
df = df.replace(' ', np.nan)                   # to get rid of empty values
nan_values = df[df.isna().any(axis=1)]         # to get all rows with Na

nan_values                                     # view df with NaN rows only

Suggestion, it should be `df.replace('', np.nan)` - without the space when checking for empty — FullMetalScientist, Jun 29 '22 at 21:55

score 4 · Answer 6 · answered Oct 27 '17 at 12:19

4

Partial solution: for a single string column tmp = df['A1'].fillna(''); isEmpty = tmp=='' gives boolean Series of True where there are empty strings or NaN values.

answered Oct 27 '17 at 12:19

lahoffm

41
2

score 2 · Answer 7 · edited Aug 17 '20 at 21:54

2

Another opltion covering cases where there might be severar spaces is by using the isspace() python function.

df[df.col_name.apply(lambda x:x.isspace() == False)] # will only return cases without empty spaces

adding NaN values:

df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())]

edited Aug 17 '20 at 21:54

lukaszberwid

1,097
7
19

answered Apr 23 '20 at 10:40

saias

406
1
3
12

alas...`AttributeError: 'NoneType' object has no attribute 'isspace'` – Monica Heddneck Apr 16 '21 at 07:44

score 2 · Answer 8 · answered May 17 '20 at 09:57

2

you also do something good:

text_empty = df['column name'].str.len() > -1

df.loc[text_empty].index

The results will be the rows which are empty & it's index number.

answered May 17 '20 at 09:57

Mohamed Abdelsalam

21
2

score 1 · Answer 9 · answered Jul 30 '19 at 05:33

1

To obtain all the rows that contains an empty cell in in a particular column.

DF_new_row=DF_raw.loc[DF_raw['columnname']=='']

This will give the subset of DF_raw, which satisfy the checking condition.

answered Jul 30 '19 at 05:33

Shara

57
9

score 1 · Answer 10 · edited May 06 '21 at 22:23

1

You can use string methods with regex to find cells with empty strings:

df[~df.column_name.str.contains('\w')].column_name.count()

edited May 06 '21 at 22:23

blackgreen

34,072
23
111
129

answered May 06 '21 at 15:09

Todd S

11
1

Find empty or NaN entry in Pandas Dataframe

10 Answers10

Linked

Related