How to add new columns based on conditions (without facing JaninoRuntimeException or OutOfMemoryError)?

Question

Trying to create a spark data frame with multiple additional columns based on conditions like this

df
    .withColumn("name1", someCondition1)
    .withColumn("name2", someCondition2)
    .withColumn("name3", someCondition3)
    .withColumn("name4", someCondition4)
    .withColumn("name5", someCondition5)
    .withColumn("name6", someCondition6)
    .withColumn("name7", someCondition7)

I am faced with the following exception in case more than 6 .withColumn clauses are added

org.codehaus.janino.JaninoRuntimeException: Code of method "()V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator" grows beyond 64 KB

This problem has been reported elsewhere as well e.g.

Is there a property in spark where I can configure the size?

edit

if even more columns are created e.g. around 20 I do no longer receive the aforementioned exception, but rather get the following error after 5 minutes of waiting:

java.lang.OutOfMemoryError: GC overhead limit exceeded

What I want to perform is a spelling/error correction. some simple cases could be handled easily via a map& replacement in a UDF. Still, several other cases with multiple chained conditions remain.

I will also follow up there: https://issues.apache.org/jira/browse/SPARK-18532

A minimal reproducible example can be found here https://gist.github.com/geoHeil/86e5401fc57351c70fd49047c88cea05

score 2 · Accepted Answer · edited May 23 '17 at 12:24

2

This error is caused by WholeStageCodegen and JVM issue.

Quick answer: no, you cannot change the limit. Please look at this question, 64KB is the maximum method size in JVM.

We must wait for a workaround in Spark, currently there's nothing you can change in system parameters

edited May 23 '17 at 12:24

Community

1
1

answered Nov 22 '16 at 12:03

T. Gawęda

15,706
4
46
61

How to add new columns based on conditions (without facing JaninoRuntimeException or OutOfMemoryError)?

edit

1 Answers1

Linked