I have HDFS directory with 13.2 GB and 4 files in it. I am trying to read all files using wholeTextFile
method in spark, But i have some issues
This is my code.
val path = "/tmp/cnt/warehouse/"
val whole = sc.wholeTextFiles("path",32)
val data = whole.map(r => (r._1,r._2.split("\r\n")))
val x = file.flatMap(r => r._1)
x.take(1000).foreach(println)
Below is the spark Submit.
spark2-submit \
--class SparkTest \
--master yarn \
--deploy-mode cluster \
--num-executors 32 \
--executor-memory 15G \
--driver-memory 25G \
--conf spark.yarn.maxAppAttempts=1 \
--conf spark.port.maxRetries=100 \
--conf spark.kryoserializer.buffer.max=1g \
--conf spark.yarn.queue=xyz \
SparkTest-1.0-SNAPSHOT.jar
- even though i give min partitions 32, it is storing in 4 partitions only.
- My spark submit is correct or not?
Error Below
Job aborted due to stage failure: Task 0 in stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 (TID 113, , executor 37): ExecutorLostFailure (executor 37 exited caused by one of the running tasks) Reason: Container from a bad node: container_e599_1560551438641_35180_01_000057 on host: . Exit status: 52. Diagnostics: Exception from container-launch.
Container id: container_e599_1560551438641_35180_01_000057
Exit code: 52
Stack trace: ExitCodeException exitCode=52:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:604)
at org.apache.hadoop.util.Shell.run(Shell.java:507)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:789)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.__launchContainer__(LinuxContainerExecutor.java:399)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 52
.
Driver stacktrace: