java.lang.IllegalArgumentException: Illegal sequence boundaries Spark

Question

I am using Azure Databricks and Scala. I wanna show() a Dataframe but I obtained an error that I can not understand and I would like to solve it. The lines of code that I have are:

println("----------------------------------------------------------------Printing schema")
df.printSchema()
println("----------------------------------------------------------------Printing dataframe")
df.show()
println("----------------------------------------------------------------Error before")

The Standard output is the following one, the message "----------------------------------------------------------------Error before" it does not appears.

>     ----------------------------------------------------------------Printing schema
>     root
>      |-- processed: integer (nullable = false)
>      |-- processDatetime: string (nullable = false)
>      |-- executionDatetime: string (nullable = false)
>      |-- executionSource: string (nullable = false)
>      |-- executionAppName: string (nullable = false)
>     
>     ----------------------------------------------------------------Printing dataframe
>     2020-02-18T14:19:00.069+0000: [GC (Allocation Failure) [PSYoungGen: 1497248K->191833K(1789440K)] 2023293K->717886K(6063104K),
> 0.0823288 secs] [Times: user=0.18 sys=0.02, real=0.09 secs] 
>     2020-02-18T14:19:40.823+0000: [GC (Allocation Failure) [PSYoungGen: 1637209K->195574K(1640960K)] 2163262K->721635K(5914624K),
> 0.0483384 secs] [Times: user=0.17 sys=0.00, real=0.05 secs] 
>     2020-02-18T14:19:44.843+0000: [GC (Allocation Failure) [PSYoungGen: 1640950K->139092K(1809920K)] 2167011K->665161K(6083584K),
> 0.0301711 secs] [Times: user=0.11 sys=0.00, real=0.03 secs] 
>     2020-02-18T14:19:50.910+0000: Track exception: Job aborted due to stage failure: Task 59 in stage 62.0 failed 4 times, most recent
> failure: Lost task 59.3 in stage 62.0 (TID 2672, 10.139.64.6, executor
> 1): java.lang.IllegalArgumentException: Illegal sequence boundaries:
> 1581897600000000 to 1581811200000000 by 86400000000
>       at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage23.processNext(Unknown
> Source)
>       at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$15$$anon$2.hasNext(WholeStageCodegenExec.scala:659)
>       at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>       at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>       at org.apache.spark.scheduler.Task.doRunTask(Task.scala:139)
>       at org.apache.spark.scheduler.Task.run(Task.scala:112)
>       at org.apache.spark.executor.Executor$TaskRunner$$anonfun$13.apply(Executor.scala:497)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1526)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:503)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>     
>     Driver stacktrace:.
>     2020-02-18T14:19:50.925+0000: Track message: Process finished with exit code 1. Metric: Writer. Value: 1.0.

It may help if you can show us how yo reproduce the problem. — Luis Miguel Mejía Suárez, Feb 18 '20 at 14:30
Well the first step to solve any problem is to be able to minimise it and make it reproducible. — Luis Miguel Mejía Suárez, Feb 18 '20 at 14:35

score 6 · Answer 1 · answered Apr 05 '20 at 12:03

It's hard to know exactly without seeing your code, but I had a similar error and the other answer (about int being out of range) led me astray.

The java.lang.IllegalArgumentException you are getting is confusing but is actually quite specific:

Illegal sequence boundaries: 1581897600000000 to 1581811200000000 by 86400000000

This error is complaining that that you are using a sequence() spark SQL function and you are telling it to go from 1581897600000000 to 1581811200000000 by 86400000000. It's easy to miss because of the big numbers, but this an instruction to go from a larger number to a smaller number by an increment of a positive integer. E.g., from 12 to 6 by 3.

This is not allowed according to the DataBricks documentation:

start - an expression. The start of the range.

stop - an expression. The end the range (inclusive).

step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it’s 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.

Additionally, I believe the other answer's focus on the int column is misleading. The large numbers mentioned in the illegal sequence error look like they are coming from a date column. You don't have any DateType columns but your string columns are named like date columns; presumably you are using them in a sequence function and they are getting coerced into dates.

score 2 · Answer 2 · answered Apr 14 '21 at 02:32

You can get this error when you attempt to

sequence(start_date, end_date, [interval])

on a table which has some of start_dates less than end_dates and others greater

When applying this function all of date ranges should be either positive or negative, not mixed

score 0 · Answer 3 · answered Feb 18 '20 at 14:43

0

Your schema is expecting an int, an int in Java has a maximum size of [-2 147 483 648 to +2 147 483 647].

So I would change the schema from int to long.

answered Feb 18 '20 at 14:43

nathan_gs

153
4

How do you know is that error? Do you know how can I know which column is giving me that issue? Will you change the schema like that "spark.sql("SELECT (column_int as long)" – Eric Bellet Feb 18 '20 at 15:44
You only have a single column with `int`, named `processed`. However I am not sure how the int is generated or where it came from. It is a guess, based on the available inputs. – nathan_gs Feb 18 '20 at 20:11

java.lang.IllegalArgumentException: Illegal sequence boundaries Spark

3 Answers3