92

I am trying to search through a Pandas Dataframe to find where it has a missing entry or a NaN entry.

Here is a dataframe that I am working with:

cl_id       a           c         d         e        A1              A2             A3
    0       1   -0.419279  0.843832 -0.530827    text76        1.537177      -0.271042
    1       2    0.581566  2.257544  0.440485    dafN_6        0.144228       2.362259
    2       3   -1.259333  1.074986  1.834653    system                       1.100353
    3       4   -1.279785  0.272977  0.197011     Fifty       -0.031721       1.434273
    4       5    0.578348  0.595515  0.553483   channel        0.640708       0.649132
    5       6   -1.549588 -0.198588  0.373476     audio       -0.508501               
    6       7    0.172863  1.874987  1.405923    Twenty             NaN            NaN
    7       8   -0.149630 -0.502117  0.315323  file_max             NaN            NaN

NOTE: The blank entries are empty strings - this is because there was no alphanumeric content in the file that the dataframe came from.

If I have this dataframe, how can I find a list with the indexes where the NaN or blank entry occurs?

edesz
  • 11,756
  • 22
  • 75
  • 123

10 Answers10

81

np.where(pd.isnull(df)) returns the row and column indices where the value is NaN:

In [152]: import numpy as np
In [153]: import pandas as pd
In [154]: np.where(pd.isnull(df))
Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))

In [155]: df.iloc[2,7]
Out[155]: nan

In [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]
Out[160]: [nan, nan, nan, nan, nan, nan]

Finding values which are empty strings could be done with applymap:

In [182]: np.where(df.applymap(lambda x: x == ''))
Out[182]: (array([5]), array([7]))

Note that using applymap requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • For the blank/missing entries (applymap), is there a way to put this in a list? ex.: is there a way to extract a list as [2,5], corresponding to index 2 and index 5? – edesz Nov 26 '14 at 22:25
  • 2
    You could make a list of "coordinates" with `zip(np.where(df.applymap(lambda x: x == '')))` – unutbu Nov 26 '14 at 23:06
  • 3
    The suggestion in this answer is what I have used: `df = df.replace('', np.nan)` to [replace the blank strings by `NaN`](https://www.quora.com/How-do-I-replace-all-blank-empty-cells-in-a-pandas-dataframe-with-NaNs) and then `df.loc[df.isna().any(axis=1)]` to get the output `DataFrame`. By doing this, as suggested by @unutbu, there is no need for [the slow `.apply()`](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code/54432584#54432584) or `.applymap()`. – edesz Jun 17 '19 at 23:03
  • Adding more details to the above answer, you can get column numbers with null values by `print(set((np.where(pd.isnull(train_df)))[1]))` and print the column names using `df.columns[]` – Rushikesh Gaidhani Jan 22 '20 at 15:04
58

Try this:

df[df['column_name'] == ''].index

and for NaNs you can try:

pd.isna(df['column_name'])
Vyachez
  • 941
  • 8
  • 15
23

Check if the columns contain Nan using .isnull() and check for empty strings using .eq(''), then join the two together using the bitwise OR operator |.

Sum along axis 0 to find columns with missing data, then sum along axis 1 to the index locations for rows with missing data.

missing_cols, missing_rows = (
    (df2.isnull().sum(x) | df2.eq('').sum(x))
    .loc[lambda x: x.gt(0)].index
    for x in (0, 1)
)

>>> df2.loc[missing_rows, missing_cols]
         A2       A3
2            1.10035
5 -0.508501         
6       NaN      NaN
7       NaN      NaN
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 3
    This should be a new accepted answer as it gives the best overview of the missing values. – Kokokoko Aug 15 '21 at 14:31
  • `.eq('')` also works inside `.query()`. So you can find them by `df.query('column_name.eq("")', engine='python')` – mrdaliri Apr 06 '22 at 01:42
13

I've resorted to

df[ (df[column_name].notnull()) & (df[column_name]!=u'') ].index

lately. That gets both null and empty-string cells in one go.

jeremy_rutman
  • 3,552
  • 4
  • 28
  • 47
9

In my opinion, don't waste time and just replace with NaN! Then, search all entries with Na. (This is correct because empty values are missing values anyway).

import numpy as np                             # to use np.nan 
import pandas as pd                            # to use replace
    
df = df.replace(' ', np.nan)                   # to get rid of empty values
nan_values = df[df.isna().any(axis=1)]         # to get all rows with Na

nan_values                                     # view df with NaN rows only
Zhannie
  • 177
  • 2
  • 5
4

Partial solution: for a single string column tmp = df['A1'].fillna(''); isEmpty = tmp=='' gives boolean Series of True where there are empty strings or NaN values.

lahoffm
  • 41
  • 2
2

Another opltion covering cases where there might be severar spaces is by using the isspace() python function.

df[df.col_name.apply(lambda x:x.isspace() == False)] # will only return cases without empty spaces

adding NaN values:

df[(df.col_name.apply(lambda x:x.isspace() == False) & (~df.col_name.isna())] 
lukaszberwid
  • 1,097
  • 7
  • 19
saias
  • 406
  • 1
  • 3
  • 12
2

you also do something good:

text_empty = df['column name'].str.len() > -1

df.loc[text_empty].index

The results will be the rows which are empty & it's index number.

1

To obtain all the rows that contains an empty cell in in a particular column.

DF_new_row=DF_raw.loc[DF_raw['columnname']=='']

This will give the subset of DF_raw, which satisfy the checking condition.

Shara
  • 57
  • 9
1

You can use string methods with regex to find cells with empty strings:

df[~df.column_name.str.contains('\w')].column_name.count()

blackgreen
  • 34,072
  • 23
  • 111
  • 129
Todd S
  • 11
  • 1