1

I'm having a very strange problem. I'm using dfs-datastores Pail abstraction to write data to HDFS in Java. I don't think the Pail piece is important to the problem though.

When it calls org.apache.hadoop.fs.FileSystem getFS(java.lang.String path) with a path on my local filesystem it pauses for about 2 minutes seemingly doing nothing then returns. This is on my laptop.

The weird thing is that it worked really fast when I was on the network at my office today, but now that I'm home it's doing it again. I'm running Ubuntu 10.10 64-bit with Java 1.7.

Anyone have any ideas what it's doing? What could be different between being at work and being at home?

UPDATE: I've been stepping through code with the debugger and it seems to be having trouble in Configuration.loadResource(). It's calling that multiple times and it will take 5-10 seconds to return from that function.

UPDATE2: I've narrowed this down a little further. The biggest hang up seems to be when it calls KerberosName.setConfiguration(). Which would explain why it runs fast at work since the Active Directory acts as a Kerberos server. I don't have one here at home, so it can't find one. Now they question is why in the world it's trying to load the Java Kerberos stuff.

Dave Kincaid
  • 3,970
  • 3
  • 24
  • 32

1 Answers1

1

I found a solution (or at least a work around). I installed the krb5-kdc package and now my little program runs fast without any unexplained pauses. After this I removed krb5-kdc, tested and it was still running fast. I removed /etc/krb5.conf and it started doing the pause again. It looks like using the Hadoop library on Ubuntu (at least) requires a /etc/krb5.conf file.

Maybe this will help someone else.

Dave Kincaid
  • 3,970
  • 3
  • 24
  • 32
  • I ran into the same problem on Ubuntu 12.04.2 LTS; running `sudo apt-get install krb5-config` created the /etc/krb5.conf file, which fixed things for me. – Josh Rosen Jun 30 '13 at 01:18