I am using Jupyter notebook in emr to handle large chunks of data. While processing data I see this error:
An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 108 tasks (1027.9 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
It seems I need to update the maxResultsSize in the spark config. How do I set spark maxResultsSize from jupyter notebook.
Already checked this post: Spark 1.4 increase maxResultSize memory
Also, In emr notebook, spark context is already given, is there any way to edit spark context and increase maxResultsSize
Any leads would be very helpful.
Thanks