How to capture rows that are not casted by pyspark function?

Asked Oct 05 '19 at 05:56

Active Oct 05 '19 at 05:56

Viewed 122 times

I have a function written which converts the datatype of a dataframe to the specified schema in Pyspark. Cast function silently makes the entry as Null if it is not able to convert to the respective datatype.

e.g. F.col(col_name).cast(IntegerType())will typecast to Integer and if the column value is Long it will make that as null.

Is there any way to capture the cases where it converts to Null? In a data pipeline that runs daily, if those are not captured it will silently make them Null and pass to the upstream systems.

asked Oct 05 '19 at 05:56

user3222101

1,270
2
24
43

how about `isNull()` after the `cast` step? – pissall Oct 05 '19 at 07:07
1

Possible duplicate of [Spark Equivalent of IF Then ELSE](https://stackoverflow.com/questions/39048229/spark-equivalent-of-if-then-else) – pault Oct 05 '19 at 11:24
you might want to cast using an UDF that raises an Error in case it cannot convert the data types – Paul Oct 06 '19 at 13:30

How to capture rows that are not casted by pyspark function?

0 Answers0