0

Right I am on a new environment upgraded from Spark 2.4 to Spark 3.0 and I am receiving these errors

ERROR 1 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'yyyy-MM-dd hh:mm:ss aa' pattern in the DateTimeFormatter Lines causing this – from_unixtime(unix_timestamp(powerappcapturetime_local, 'yyyy-MM-dd hh:mm:ss aa')+ (timezoneoffset*60),'yyyy-MM-dd HH:mm:ss') as powerappcapturetime

ERROR 2 DataSource.Error: ODBC: ERROR [42000] [Microsoft][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 94.0 failed 4 times, most recent failure: Lost task 0.3 in stage 94.0 (TID 1203) (10.139.64.43 executor 3): org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse ' 01/19/2022' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string. at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:150) at org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper$$anonfun$checkParsedDiff$1.applyOrElse(DateTimeFormatterHelper.scala:141) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38) at org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter.$anonfun$parse$1(TimestampFormatter.scala:86) at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at scala.Option.getOrElse(Option.scala:189)” Lines causing this – cast ( to_date ( TT_VALID_FROM_TEXT, 'MM/dd/yyyy') as timestamp) as ttvalidfrom

My code is python with sql in the middle of it.

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

query_view_create = '''

CREATE OR REPLACE VIEW {0}.{1} as

SELECT customername , cast ( to_date ( TT_VALID_FROM_TEXT, 'MM/dd/yyyy') as timestamp) as ttvalidfrom , from_unixtime(unix_timestamp(powerappcapturetime_local, 'yyyy-MM-dd hh:mm:ss aa')+ (timezoneoffset*60),'yyyy-MM-dd HH:mm:ss') as powerappcapturetime from {0}.{2} '''.format(DATABASE_NAME,VIEW_NAME_10,TABLE_NAME_12,ENVIRONMENT) print(query_view_create)

Added to fix datetime issues we see when using spark 3.0 with Power BI that don't appear in spark 2.4

spark.sql(query_view_create)

The error still comes from Power BI when I import the table into Power BI . Not sure what I can do to make this work and not display these errors ?

James Khan
  • 773
  • 2
  • 18
  • 46

1 Answers1

0

@James Khan, Thanks for finding the source of the problem. Posting your discussion as an Answer to help other community members.

To set the legacy timeParserPolicy the below code may work.

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

OR

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

Still If you are getting the same after this, please check this similar SO thread.

Reference:
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/parameters/legacy_time_parser_policy

Rakesh Govindula
  • 5,257
  • 1
  • 2
  • 11