I am running a small script in Pyspark, where I am extracting some data from hbase tables and creating a Pyspark data frame. I am trying to save the dataframe back onto local hdfs, and am running into an exit 50 error.
I am able to do the same operation successfully for comparatively smaller dataframes, but can't for large files. I can gladly share any code snippets and would appreciate any help. Also, the entire environment from SparkUI can be shared as a screenshot.
This is the config for my Spark(2.0.0) Properties (shown here as a dictionary). Deployed on yarn-client.
configuration={'spark.executor.memory':'4g',
'spark.executor.instances':'32',
'spark.driver.memory':'12g',
'spark.yarn.queue':'default'
}
After I obtain the dataframe, I am trying to save it as:
df.write.save('user//hdfs//test_df',format = 'com.databricks.spark.csv',mode = 'append')
The following error block keeps on repeating until the job fails. I believe it might be an OOM error, but I have tried by giving as many as 128 executors, each with 16GB memory, but to no avail. Any workaround would be greatly appreciated.
Container exited with a non-zero exit code 50
17/09/25 15:19:35 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 64, fslhdppdata2611.imfs.micron.com): ExecutorLostFailure (executor 42 exited caused by one of the running tasks) Reason: Container marked as failed: container_e37_1502313369058_6420779_01_000043 on host: fslhdppdata2611.imfs.micron.com. Exit status: 50. Diagnostics: Exception from container-launch.
Container id: container_e37_1502313369058_6420779_01_000043
Exit code: 50
Stack trace: org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException: Launch container failed
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DefaultLinuxContainerRuntime.launchContainer(DefaultLinuxContainerRuntime.java:109)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DelegatingLinuxContainerRuntime.launchContainer(DelegatingLinuxContainerRuntime.java:89)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:392)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is hdfsprod
main : requested yarn user is hdfsprod
Getting exit code file...
Creating script paths...
Writing pid file...
Writing to tmp file /opt/hadoop/data/03/hadoop/yarn/local/nmPrivate/application_1502313369058_6420779/container_e37_1502313369058_6420779_01_000043/container_e37_1502313369058_6420779_01_000043.pid.tmp
Writing to cgroup task files...
Creating local dirs...
Launching container...
Getting exit code file...
Creating script paths...