I am using Azure Databricks and Scala. I wanna show() a Dataframe but I obtained an error that I can not understand and I would like to solve it. The lines of code that I have are:
println("----------------------------------------------------------------Printing schema")
df.printSchema()
println("----------------------------------------------------------------Printing dataframe")
df.show()
println("----------------------------------------------------------------Error before")
The Standard output is the following one, the message "----------------------------------------------------------------Error before" it does not appears.
> ----------------------------------------------------------------Printing schema
> root
> |-- processed: integer (nullable = false)
> |-- processDatetime: string (nullable = false)
> |-- executionDatetime: string (nullable = false)
> |-- executionSource: string (nullable = false)
> |-- executionAppName: string (nullable = false)
>
> ----------------------------------------------------------------Printing dataframe
> 2020-02-18T14:19:00.069+0000: [GC (Allocation Failure) [PSYoungGen: 1497248K->191833K(1789440K)] 2023293K->717886K(6063104K),
> 0.0823288 secs] [Times: user=0.18 sys=0.02, real=0.09 secs]
> 2020-02-18T14:19:40.823+0000: [GC (Allocation Failure) [PSYoungGen: 1637209K->195574K(1640960K)] 2163262K->721635K(5914624K),
> 0.0483384 secs] [Times: user=0.17 sys=0.00, real=0.05 secs]
> 2020-02-18T14:19:44.843+0000: [GC (Allocation Failure) [PSYoungGen: 1640950K->139092K(1809920K)] 2167011K->665161K(6083584K),
> 0.0301711 secs] [Times: user=0.11 sys=0.00, real=0.03 secs]
> 2020-02-18T14:19:50.910+0000: Track exception: Job aborted due to stage failure: Task 59 in stage 62.0 failed 4 times, most recent
> failure: Lost task 59.3 in stage 62.0 (TID 2672, 10.139.64.6, executor
> 1): java.lang.IllegalArgumentException: Illegal sequence boundaries:
> 1581897600000000 to 1581811200000000 by 86400000000
> at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage23.processNext(Unknown
> Source)
> at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$15$$anon$2.hasNext(WholeStageCodegenExec.scala:659)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
> at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
> at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
> at org.apache.spark.scheduler.Task.run(Task.scala:112)
> at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
> Driver stacktrace:.
> 2020-02-18T14:19:50.925+0000: Track message: Process finished with exit code 1. Metric: Writer. Value: 1.0.