Spark on Yarn Container Failure

Question

For reference: I solved this issue by adding Netty 4.1.17 in hadoop/share/hadoop/common

No matter what jar I try and run (including the example from https://spark.apache.org/docs/latest/running-on-yarn.html), I keep getting an error regarding container failure when running Spark on Yarn. I get this error in the command prompt:

Diagnostics: Exception from container-launch.
Container id: container_1530118456145_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:585)
    at org.apache.hadoop.util.Shell.run(Shell.java:482)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

When I look at the logs, I then find this error:

Exception in thread "main" java.lang.NoSuchMethodError:io.netty.buffer.PooledByteBufAllocator.metric()Lio/netty/buffer/PooledByteBufAllocatorMetric;
    at org.apache.spark.network.util.NettyMemoryMetrics.registerMetrics(NettyMemoryMetrics.java:80)
    at org.apache.spark.network.util.NettyMemoryMetrics.<init>(NettyMemoryMetrics.java:76)
    at org.apache.spark.network.client.TransportClientFactory.<init>(TransportClientFactory.java:109)
    at org.apache.spark.network.TransportContext.createClientFactory(TransportContext.java:99)
    at org.apache.spark.rpc.netty.NettyRpcEnv.<init>(NettyRpcEnv.scala:71)
    at org.apache.spark.rpc.netty.NettyRpcEnvFactory.create(NettyRpcEnv.scala:461)
    at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:57)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:530)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:347)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
    at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:869)
    at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)

Any idea why this is happening? This is running on a pseudo-distributed cluster set up according to this tutorial: https://wiki.apache.org/hadoop/Hadoop2OnWindows. Spark runs fine locally, and seeing as this jar was provided with Spark, I doubt it's a problem within the jar. (Regardless, I added a Netty dependency inside another jar and I'm still getting the same error).

The only thing set in my spark-defaults.conf is spark.yarn.jars, which points to a hdfs directory where I uploaded all of Spark's jars. io.netty.buffer.PooledByteBufAllocator is contained within these jars.

Spark 2.3.1, Hadoop 2.7.6

score 5 · Accepted Answer · answered Jul 27 '18 at 20:46

5

I had exactly same issue. Previously I used Hadoop 2.6.5 and the compatible spark version, things worked out fine. When I switched to Hadoop 2.7.6, problem occurred. Not sure what is cause, but I copied to netty.4.1.17.Final jar file to the hadoop library folder then the problem goes away.

answered Jul 27 '18 at 20:46

ascetic652

472
1
5
18

Holy cow, finally found this comment after hours of searching, this did the trick. Thank you! – NateH06 Jul 17 '19 at 20:24

score 0 · Answer 2 · answered Jun 27 '18 at 18:38

0

Seems like you have multiple netty version on your classpath ,

mvn clean compile
Remove all and add latest one.

answered Jun 27 '18 at 18:38

vaquar khan

10,864
5
72
96

i'm using an example jar provided by spark, so i can't recompile it and i don't think that's the error. – wordsmith Jun 27 '18 at 18:47

score 0 · Answer 3 · answered Jun 27 '18 at 18:39

0

This may have the version problem between your yarn and spark. check the compatibility of the versions are installed.

I strongly suggest to read more about NoSuchMethodError and some other similar Exceptions like NoClassDefFoundError and ClassNotFoundException. This suggestions reason is that when you start using spark in different situations these are the much more confusing errors and exception for the people are not so experienced. NosuchMethodError

Of course caring a lot is the best practice strategy for a programmer absolutely the ones working on distributed systems like spark. Well Done. ;)

answered Jun 27 '18 at 18:39

Amin Heydari Alashti

881
1
9
20

Versions ~should~ be fine; I have Spark 2.3.1 built for Hadoop 2.7+ and Hadoop 2.7.6. I get the same error when using Spark 2.3.0 built for Hadoop 2.7+. So I guess it probably has something to do with configurations (or a lack thereof?). – wordsmith Jun 27 '18 at 20:44
Just looked again--the versions of Netty that Hadoop use and Spark use are different, with Hadoop's being lower than Spark's (4.0.23 vs. 4.1.17, 3.6.2 vs. 3.9.9). However, I don't want to upgrade Hadoop as that might cause other problems since Spark 2.3.1 is only built for Hadoop 2.7, presumably, and not Hadoop 2.8 or 2.9. – wordsmith Jun 27 '18 at 21:03
1

For reference--I added the 4.1.17 jar in Hadoop's library and deleted its 4.0.23, and the example jar I was running worked. This might be problematic at some point in the future, though. – wordsmith Jun 27 '18 at 21:11

Spark on Yarn Container Failure

3 Answers3

Linked