1

When running my job, I am getting the following exception:

Exception in User Class: org.apache.spark.SparkException : Job aborted due to stage failure: Task 32 in stage 2.0 failed 4 times, most recent failure: Lost task 32.3 in stage 2.0 (TID 50) (10.100.1.48 executor 8): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from Parquet INT96 files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.int96RebaseModeInRead to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. Or set spark.sql.legacy.parquet.int96RebaseModeInRead to 'CORRECTED' to read the datetime values as it is.

I have tried to apply the requested configuration value, as follows:

    val conf = new SparkConf()
    conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "LEGACY")

    val spark: SparkContext = new SparkContext(conf)
    //Get current sparkconf which is set by glue
    
    val glueContext: GlueContext = new GlueContext(spark)
    val args = GlueArgParser.getResolvedOptions(
      sysArgs, 
      Seq("JOB_NAME").toArray
    )
    Job.init(args("JOB_NAME"), glueContext, args.asJava)

but the same error occurs. I have also tried setting it to "CORRECTED" via the same approach.

It seems that the config is not properly making its way into the Spark execution. What is the proper way to get Spark config values set from a ScalaSpark job on Glue?

jamesbascle
  • 854
  • 1
  • 10
  • 17

2 Answers2

1

When you are migrating between versions it is always best to check out the Migration guides by AWS. In your case this can be set in your Glue Job properties by passing below properties as per requirement.To set these navigate to Glue console -> Jobs -> Click on Job -> Job details -> Advanced properties -> Job parameters.

- Key: --conf
- Value: spark.sql.legacy.parquet.int96RebaseModeInRead=[CORRECTED|LEGACY] --conf spark.sql.legacy.parquet.int96RebaseModeInWrite=[CORRECTED|LEGACY] --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=[CORRECTED|LEGACY]

Please refer to below guide for the more information:

https://docs.aws.amazon.com/glue/latest/dg/migrating-version-30.html#migrating-version-30-from-20

Prabhakar Reddy
  • 4,628
  • 18
  • 36
  • I tried this approach, but it did not seem to get the config into the system either, unfortunately. The AWS docs also say that conf is an internal parameter that you should not set. – jamesbascle Sep 20 '22 at 15:48
  • this property should work, can you share the screenshot of how you setting them? Also check the job logs to see if it's getting passed. What error you get after using this properties – Prabhakar Reddy Sep 21 '22 at 05:48
  • So, you did denote a couple of the correct settings here, but setting it via the job parameters wasn't working. I'm posting another answer with what ended up working from us that came from AWS support. – jamesbascle Oct 04 '22 at 00:41
  • This works! I've been dealing with the same error for a few days. Restarting the Spark context is not an option in Glue so you have to pass these settings as part of the parameters for the job. Sounds weird but internally Glue set them for you if --conf is found in the parameter list. – jose.arias Feb 14 '23 at 15:18
1

This code at the top of my glue job seems to have done the trick

val conf = new SparkConf()

//alternatively, use LEGACY if that is required
conf.set("spark.sql.legacy.parquet.int96RebaseModeInRead", "CORRECTED")
conf.set("spark.sql.legacy.parquet.int96RebaseModeInWrite", "CORRECTED")
conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED")
conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInWrite", "CORRECTED")

val spark: SparkContext = new SparkContext(conf)

val glueContext: GlueContext = new GlueContext(spark)
jamesbascle
  • 854
  • 1
  • 10
  • 17