0

I am applying a function on a pandas column, but I am getting an error on certain values. I am not sure what is causing the error so to fix the error I want to see that value where the error is raised.

For eg suppose I have a column in the dataframe which contains all numerical values in string format except one value.

DataFrame:

|col num|col random|
|:-:|:-:|
|'8'|0.2|
|'9'|0.9|
|'abcd'|0|
|'10'|1|

And I am applying int function on that column. So it will raise the error on that exception.

df['col num'].apply(int)

And for debugging, I want to get "abcd".

Note the above scenario is just an example and I am not trying to convert any data type of a column;

aaossa
  • 3,763
  • 2
  • 21
  • 34
Darkstar Dream
  • 1,649
  • 1
  • 12
  • 23

1 Answers1

1

In this particular example the error does provide information on which value could not be cast:

df = pd.DataFrame({"col num": ["1", "2", "3", "abcd", "10"]})
df["col num"].astype(int)

# ValueError: invalid literal for int() with base 10: 'abcd'

If that's not the case for you, it might depend on the pandas version.


In general, define a separate function which you then apply to the DataFrame/Series, then you can include all the printing and debugging you want:

def catch_int_casting(x):
    try:
        return int(x)
    except Exception as e:
        print(f"The following value raised exception \"{e}\"")
        print(x)

df[['col num']].apply(catch_int_casting, axis=1)

# Produces:

# The following value raised exception "invalid literal for int() with base 10: 'abcd'"
# col num    abcd
# Name: 3, dtype: object

Note that I'm applying this to a single-column Dataframe (df[['col num']]) rather than the series, in order to also get the index number (Name: 3) in the output.

mcsoini
  • 6,280
  • 2
  • 15
  • 38