4

I am, in the included test program, attempting to copy a file from the local disk to HDFS. The code is as follows:

package foo.foo1.foo2.test;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class TestTestTest {

    public static void main(String[] args) {

    String srcLocation = "foo";
    String destination = "hdfs:///tmp/";

    FileSystem hdfs = null;

    Configuration configuration = new Configuration();
    configuration.set("fs.default.name", "hdfs://namenode:54310/");

    try {
        hdfs = FileSystem.get(configuration);
    } catch (IOException e2) {
        e2.printStackTrace();
        return;
    }

    Path srcpath = new Path(srcLocation);
    Path dstpath = new Path(destination);

    try {
        hdfs.copyFromLocalFile(srcpath, dstpath);
    } catch (IOException e) {
        e.printStackTrace();
    }

    }

}

This fails with the following exception:

java.io.IOException: Call to namenode/10.1.1.1:54310 failed on local exception:     java.io.EOFException
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
    at org.apache.hadoop.ipc.Client.call(Client.java:743)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy0.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:95)
    at foo.foo1.foo2.test.TestTestTest.main(TestTestTest.java:22)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:375)
    at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
    at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)

My question is deceptively simple: What is causing this, and how can I make this program work? From what little information I've been able to find, I gather that there is a problem connecting to HDFS, and that this has something to do with the fs.default.name property in the Configuration. Below is the relevant section of my core-site.xml file:

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://namenode:54310</value>
  </property>

</configuration>

Perhaps of special interest is the fact that if I bundle all the jars in my classpath into one mega-jar, and run this program via the hadoop command, it works just fine. So what am I doing wrong?

rmtheis
  • 5,992
  • 12
  • 61
  • 78
david
  • 139
  • 1
  • 1
  • 9
  • Are you able to reach HDFS through the command-line utility? If so, are you compiling your code against the same version of Hadoop that you are running on your cluster? – Matt D Mar 14 '12 at 21:28
  • you know, i was prepared to swear they were the same versions. i mean, i'm using the name node as my development machine. surely i would be developing against the same version! but before i put my name to that, i decided to check, and wouldn't you know it! the cluster is running the cloudera jars, and the jars i was provided for development are not the same. putting the cloudera jar on the classpath instead fixed the issue! feel free to put that in an "answer". – david Mar 14 '12 at 22:04

4 Answers4

15

Make sure you are compiling against the same Hadoop version that you are running on your cluster.

Matt D
  • 3,055
  • 1
  • 18
  • 17
  • 1
    In our case it was more tricky: we got a pig-udf.jar that was built-with-dependencies with another version of hadoop client. maven saw no conflict between our version of hadoop-core.jar (0.20.2-cdh3u2) and the pig-udf.jar, and when they were both put in the classpath in the same folder and loaded with `-cp lib/*` the results were un-deterministic... – ihadanny Jul 29 '12 at 15:34
  • I had the same `java.io.EOFException` when running an HDFS file streaming job from Spark Job Server against Apache Spark on Hadoop. Only when I got Spark Job Server to use a version of Spark on its classpath that had been built against exactly the same version of Hadoop as the target runtime Spark cluster did the EOFException go away and the file streaming job work. – snark Jun 29 '15 at 11:10
1

Make sure the proper exception with serializable interface and going with a proper version of hadoop.

0

Your code for problem perhaps "foo" directory path at fault

can
  • 23
  • 7
0

I had faced the similar issue in the past. But the problem was mine , i had two different version of hadoop. i had started the daemons from earlier version and bash_profile was pointing to new and this issue occurred. So make sure you are not playing with version mismatch``

Karn_way
  • 1,005
  • 3
  • 19
  • 42