Display row with False values in validated pandas dataframe column

Question

I was validating 'Price' column in my dataframe. Sample:

    ArticleId   SiteId  ZoneId  Date       Quantity Price   CostPrice
53  194516      9          2    2018-11-26  11.0    40.64   27.73
164 200838      9          2    2018-11-13  5.0     99.75   87.24
373 200838      9          2    2018-11-27  1.0     99.75   87.34

pd.to_numeric(df_sales['Price'], errors='coerce').notna().value_counts()

And I'd love to display those rows with False values so I know whats wrong with them. How do I do that?

True     17984
False       13
Name: Price, dtype: int64

Thank you.

sophocles · Accepted Answer · 2021-02-04T19:14:11.197

1

You could print your rows when price isnull():

print(df_sales[df_sales['Price'].isnull()])

   ArticleId  SiteId  ZoneId       Date  Quantity  Price  CostPrice
1     200838       9       2 2018-11-13         5    NaN     87.240

edited Feb 04 '21 at 19:14

answered Feb 04 '21 at 18:08

sophocles

13,593
3
14
33

Trenton McKinney · Answer 2 · 2021-02-04T19:00:17.747

pd.to_numeric(df['Price'], errors='coerce').isna() returns a Boolean, which can be used to select the rows that cause errors.
- This includes NaN or rows with strings

import pandas as pd

# test data
df = pd.DataFrame({'Price': ['40.64', '99.75', '99.75', pd.NA, 'test', '99. 0', '98 0']})

   Price
0  40.64
1  99.75
2  99.75
3   <NA>
4   test
5  99. 0
6   98 0

# find the value of the rows that are causing issues
problem_rows = df[pd.to_numeric(df['Price'], errors='coerce').isna()]

# display(problem_rows)
   Price
3   <NA>
4   test
5  99. 0
6   98 0

Alternative

Create an extra column and then use it to select the problem rows

df['Price_Updated'] = pd.to_numeric(df['Price'], errors='coerce')

   Price  Price_Updated
0  40.64          40.64
1  99.75          99.75
2  99.75          99.75
3   <NA>            NaN
4   test            NaN
5  99. 0            NaN
6   98 0            NaN

# find the problem rows
problem_rows = df.Price[df.Price_Updated.isna()]

Explanation

Updating the column with .to_numeric(), and then checking for NaNs will not tell you why the rows had to be coerced.

# update the Price row 
df.Price = pd.to_numeric(df['Price'], errors='coerce')

# check for NaN
problem_rows = df.Price[df.Price.isnull()]

# display(problem_rows)
3   NaN
4   NaN
5   NaN
6   NaN
Name: Price, dtype: float64

Display row with False values in validated pandas dataframe column

2 Answers2

Alternative

Explanation