Grows beyond 64 KB Error when using "When Otherwise"

Question

When I run this Spark code in Scala:

df.withColumn(x, when(col(x).isin(values:_*),col(x)).otherwise(lit(null).cast(StringType)))

I face this Error:

     java.lang.RuntimeException: Compiling "GeneratedClass": Code of method
 "apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql
 /catalyst /expressions/UnsafeRow;" of class
 "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
        at org.codehaus.janino.UnitCompiler.compileUnit(UnitCompiler.java:361)
        at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)

df: Spark Dataset

x: StringType column, each row something like "US,Washington,Seattle"

values: Array[String]

You may want to check out https://stackoverflow.com/questions/50891509/apache-spark-codegen-stage-grows-beyond-64-kb — Lars Skaug, Jul 25 '20 at 21:34
This can happen when your code is too long without any actions. You should cache your dataframe at some point. — Lamanus, Jul 26 '20 at 14:19

Lars Skaug · Answer 1 · 2020-07-25T21:37:01.670

1

This is a known issue related to the growth of the bytecode. The common solution is to add checkpoints, i.e., to save your dataframe and read it back again.

See the following for further detail: Apache Spark Codegen Stage grows beyond 64 KB

edited Jul 25 '20 at 21:37

answered Jul 25 '20 at 21:29

Lars Skaug

1,376
1
7
13

I knew the issue, I was wondering if there is any alternative for my code (i.e. when, otherwise) that does not cause this error. – Hossein Aug 05 '20 at 21:20
Well, as Lamanus pointed out as well, caching your dataframe should prevent the error. – Lars Skaug Aug 05 '20 at 21:36

Grows beyond 64 KB Error when using "When Otherwise"

1 Answers1