pd.to_numeric(df['Price'], errors='coerce').isna()
returns a Boolean
, which can be used to select the rows that cause errors.
- This includes
NaN
or rows with strings
import pandas as pd
# test data
df = pd.DataFrame({'Price': ['40.64', '99.75', '99.75', pd.NA, 'test', '99. 0', '98 0']})
Price
0 40.64
1 99.75
2 99.75
3 <NA>
4 test
5 99. 0
6 98 0
# find the value of the rows that are causing issues
problem_rows = df[pd.to_numeric(df['Price'], errors='coerce').isna()]
# display(problem_rows)
Price
3 <NA>
4 test
5 99. 0
6 98 0
Alternative
- Create an extra column and then use it to select the problem rows
df['Price_Updated'] = pd.to_numeric(df['Price'], errors='coerce')
Price Price_Updated
0 40.64 40.64
1 99.75 99.75
2 99.75 99.75
3 <NA> NaN
4 test NaN
5 99. 0 NaN
6 98 0 NaN
# find the problem rows
problem_rows = df.Price[df.Price_Updated.isna()]
Explanation
- Updating the column with
.to_numeric()
, and then checking for NaNs
will not tell you why the rows had to be coerced.
# update the Price row
df.Price = pd.to_numeric(df['Price'], errors='coerce')
# check for NaN
problem_rows = df.Price[df.Price.isnull()]
# display(problem_rows)
3 NaN
4 NaN
5 NaN
6 NaN
Name: Price, dtype: float64