2

I am loading a DataFrame like so,

df = sparkSession.read.format("jdbc")[...]

and then writing to a parquet file.

df.write.mode(writeMode).parquet(location)

All numeric columns in the Dataset have type DecimalType(38, 10) but when I try to write one specific table to a parquet file I get the following in a stack trace:

java.lang.IllegalArgumentException: requirement failed: Decimal precision 69 exceeds max precision 38

I am having trouble debugging this issue, how can I find the row(s) that are causing this exception?

zero323
  • 322,348
  • 103
  • 959
  • 935
m_kinsey
  • 194
  • 10
  • Could you provide some details about input database and its schema? – zero323 Jan 18 '18 at 22:48
  • @user6910411 This is an Oracle database, and some rows contain very large decimal values, so I expect it is being read properly. – m_kinsey Jan 18 '18 at 22:55
  • In this case the only obvious point of failure is transformation from `ResultSet` to internal format. Which means there is a problem there, and it is out of reach,(outside debugger for example). And JDBC is "very good" at misleading Spark (MySQL is the most obvious example, but other have their own problems) and there are different quirks of Spark's JDBC dialects on top of that. So trust me, I asked for a reason. – zero323 Jan 18 '18 at 23:07
  • I found [this issue](https://issues.apache.org/jira/browse/SPARK-20427) in the spark jira. It is likely the culprit but I am unsure how to fix it for my situation. – m_kinsey Jan 18 '18 at 23:14
  • Have you tried with [2.3.0 RC](http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Spark-2-3-0-RC1-td23168.html) and / or downcasting numerics [in the query](https://stackoverflow.com/q/38729436/6910411)? – zero323 Jan 18 '18 at 23:26
  • Hello @m_kinsey, have you got any solution for this, I'm currently stuck with the same issue. Any help would be appreciated. – chaitra k Sep 10 '20 at 19:03

0 Answers0