1

I try to use spark-ec2 to launch ec2 cluster with hadoop version 2.x, so I tried:

./spark-ec2 -k spark -i ~/.ssh/spark.pem -s 1 --hadoop-major-version=2 launch my-spark-cluster

then I found out there are error in the tachyon setting up process:

Setting up tachyon
RSYNC'ing /root/tachyon to slaves...
ec2-52-1-147-16.compute-1.amazonaws.com
ec2-52-1-147-16.compute-1.amazonaws.com: Formatting Tachyon Worker @ ip-172-31-21-86.ec2.internal
ec2-52-1-147-16.compute-1.amazonaws.com: Removing local data under folder: /mnt/ramdisk/tachyonworker/
Formatting Tachyon Master @ ec2-52-1-14-186.compute-1.amazonaws.com
Formatting JOURNAL_FOLDER: /root/tachyon/libexec/../journal/
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
    at tachyon.util.CommonUtils.runtimeException(CommonUtils.java:246)
    at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:73)
    at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:53)
    at tachyon.UnderFileSystem.get(UnderFileSystem.java:53)
    at tachyon.Format.main(Format.java:54)
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
    at org.apache.hadoop.ipc.Client.call(Client.java:1070)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
    at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:69)
    ... 3 more

I've searched for some related question and it seems that Server IPC version 7 cannot communicate with client version 4 means that server is using hadoop 2.x and client is using hadoop 1.x. However, I built my spark with hadoop 2.4.0 and I also tried the official spark pre-built version with hadoop 2.4.0 and later, both lead to the same error.

By the way, hadoop version created by setting --hadoop-major-version=2 is Hadoop 2.0.0-cdh4.2.0. Is this a problem? But I tried to use 2.4 or 2.4.0 here, neither of them are recognized as valid hadoop version

dtolnay
  • 9,621
  • 5
  • 41
  • 62
user3684014
  • 1,175
  • 12
  • 26
  • I think this is a known issue. Watch [SPARK-3185](https://issues.apache.org/jira/browse/SPARK-3185) to be notified of a fix. – Nick Chammas Feb 13 '15 at 03:23
  • @Nick Chammas: and what about the second part of the question, regarding Hadoop 2.0.0 being used instead of 2.4.0? – Greg Dubicki Jun 04 '15 at 11:23
  • @GrzegorzDubicki - Don't know anything about that, but I can say that spark-ec2 recently merged in [support for launching YARN clusters](https://issues.apache.org/jira/browse/SPARK-3674), which may be relevant. – Nick Chammas Jun 07 '15 at 19:40

0 Answers0