How can I see which row is causing DataFrameWriter to fail?

Question

I am loading a DataFrame like so,

df = sparkSession.read.format("jdbc")[...]

and then writing to a parquet file.

df.write.mode(writeMode).parquet(location)

All numeric columns in the Dataset have type DecimalType(38, 10) but when I try to write one specific table to a parquet file I get the following in a stack trace:

java.lang.IllegalArgumentException: requirement failed: Decimal precision 69 exceeds max precision 38

I am having trouble debugging this issue, how can I find the row(s) that are causing this exception?

Could you provide some details about input database and its schema? — zero323, Jan 18 '18 at 22:48
@user6910411 This is an Oracle database, and some rows contain very large decimal values, so I expect it is being read properly. — m_kinsey, Jan 18 '18 at 22:55
In this case the only obvious point of failure is transformation from `ResultSet` to internal format. Which means there is a problem there, and it is out of reach,(outside debugger for example). And JDBC is "very good" at misleading Spark (MySQL is the most obvious example, but other have their own problems) and there are different quirks of Spark's JDBC dialects on top of that. So trust me, I asked for a reason. — zero323, Jan 18 '18 at 23:07
I found [this issue](https://issues.apache.org/jira/browse/SPARK-20427) in the spark jira. It is likely the culprit but I am unsure how to fix it for my situation. — m_kinsey, Jan 18 '18 at 23:14
Have you tried with [2.3.0 RC](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-3-0-RC1-td23168.html) and / or downcasting numerics [in the query](https://stackoverflow.com/q/38729436/6910411)? — zero323, Jan 18 '18 at 23:26
Hello @m_kinsey, have you got any solution for this, I'm currently stuck with the same issue. Any help would be appreciated. — chaitra k, Sep 10 '20 at 19:03

How can I see which row is causing DataFrameWriter to fail?

0 Answers0