1

I was validating 'Price' column in my dataframe. Sample:

    ArticleId   SiteId  ZoneId  Date       Quantity Price   CostPrice
53  194516      9          2    2018-11-26  11.0    40.64   27.73
164 200838      9          2    2018-11-13  5.0     99.75   87.24
373 200838      9          2    2018-11-27  1.0     99.75   87.34
pd.to_numeric(df_sales['Price'], errors='coerce').notna().value_counts()

And I'd love to display those rows with False values so I know whats wrong with them. How do I do that?

True     17984
False       13
Name: Price, dtype: int64

Thank you.

Michal
  • 229
  • 3
  • 11

2 Answers2

1

You could print your rows when price isnull():

print(df_sales[df_sales['Price'].isnull()])

   ArticleId  SiteId  ZoneId       Date  Quantity  Price  CostPrice
1     200838       9       2 2018-11-13         5    NaN     87.240
sophocles
  • 13,593
  • 3
  • 14
  • 33
1
  • pd.to_numeric(df['Price'], errors='coerce').isna() returns a Boolean, which can be used to select the rows that cause errors.
    • This includes NaN or rows with strings
import pandas as pd

# test data
df = pd.DataFrame({'Price': ['40.64', '99.75', '99.75', pd.NA, 'test', '99. 0', '98 0']})

   Price
0  40.64
1  99.75
2  99.75
3   <NA>
4   test
5  99. 0
6   98 0

# find the value of the rows that are causing issues
problem_rows = df[pd.to_numeric(df['Price'], errors='coerce').isna()]

# display(problem_rows)
   Price
3   <NA>
4   test
5  99. 0
6   98 0

Alternative

  • Create an extra column and then use it to select the problem rows
df['Price_Updated'] = pd.to_numeric(df['Price'], errors='coerce')

   Price  Price_Updated
0  40.64          40.64
1  99.75          99.75
2  99.75          99.75
3   <NA>            NaN
4   test            NaN
5  99. 0            NaN
6   98 0            NaN

# find the problem rows
problem_rows = df.Price[df.Price_Updated.isna()]

Explanation

  • Updating the column with .to_numeric(), and then checking for NaNs will not tell you why the rows had to be coerced.
# update the Price row 
df.Price = pd.to_numeric(df['Price'], errors='coerce')

# check for NaN
problem_rows = df.Price[df.Price.isnull()]

# display(problem_rows)
3   NaN
4   NaN
5   NaN
6   NaN
Name: Price, dtype: float64
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158