Problems running Mahout and Hadoop

Question

I'm new at Mahout and Hadoop.

I've successfully installed Hadoop Cluster with 3 machines, and the cluster is running fine, and I just installed Mahout on the Main namenode for "testing purposes", and I followed the instructions of installation and set the JAVA_HOME, but when I try to run classify-20newsgroups.sh it goes and download the dataset but after that I get the following error:

Error: JAVA_HOME is not set

Then I've revised the .bashrc and confirmed that the JAVA_HOME is set correctly, but it doesn't help.

Also how do I verify that Mahout is configured to run on Hadoop correctly and do you know of any example that can verify this configuration or environment?

score 0 · Answer 1 · edited May 23 '17 at 12:20

The .bashrc is only read by a shell that is non-login, otherwise is read .bash_profile. So you could set to read .bashrc from .bash_profile (see here What's the difference between .bashrc, .bash_profile, and .environment?) or just a set JAVA_HOME in .bash_profile.

There are another several possibilities to set the JAVA_HOME:

1) set .bashrc from terminal

~$ source .bashrc

2) set JAVA_HOME in open terminal before running classify-20newsgroups.sh

~$ JAVA_HOME=/path
~$ classify-20newsgroups.sh

3) run classify-20newsgroups.sh with JAVA_HOME, i.e.

~$ JAVA_HOME=/path classify-20newsgroups.sh

As for question about Mahout configuration for run on Hadoop. Standart example with classify-20newsgroups should work on hadoop if HADOOP_HOME is set.

score 0 · Answer 2 · answered Aug 25 '14 at 13:46

You might need to explicitly set JAVA_HOME in hadoop-env.sh

In hadoop-env.sh, look for the comment "#The java implementation to use", and modify the JAVA_HOME path under it.

It should look something like this:

# The java implementation to use.

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Of course fix the path of JAVA_HOME.

Problems running Mahout and Hadoop

2 Answers2