Hadoop fs -ls outputs current working directory's files rather than hdfs volume's files

Question

Have set up a single pseudo-distributed node (localhost) with Hadoop 2.8.2 on OpenSuse Tumbleweed 20170703. Java version is 1.8.0_151. Generally, it seems to be set up correctly. I can format namenode with no errors etc.

However, when I try hadoop fs -ls, files/dirs from the current working directory are returned rather than the expected behaviour of returning the hdfs volume files (which should be nothing at the moment).

Was originally following this guide for CentOS (making changes as required) and the Apache Hadoop guide.

I'm assuming that it's a config issue, but I can't see why it would be. I've played around with core-site.xml and hdfs-site.xml as per below with no luck.

/opt/hadoop-hdfs-volume/ exists and is assigned to user hadoop in user group hadoop. As is the /opt/hadoop/ directory (for bin stuff).

EDIT:

/tmp/hadoop-hadoop/dfs/name is where the hdfs namenode -format command runs. /tmp/ also seems to hold my user (/tmp/hadoop-dijksterhuis) and the hadoop user directories.

This seems odd to me considering the *-site.xml config files below.

Have tried restarting the dfs and yarn services with the .sh scripts in the hadoop/sbin/ directory. Have also rebooted. No luck!

core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop-hdfs-volume/${user.name}</value>
    </property>
</configuration>

hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>${hadoop.tmp.dir}/dfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>${hadoop.tmp.dir}/dfs/name</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Anyone have any ideas? I can provide more details if needed.

score 0 · Answer 1 · answered Dec 28 '17 at 23:17

Managed to hack a fix via another SO answer:

Add $HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop to the hadoop user's .bashrc.

This has the effect of overriding the value in etc/hadoop-env.sh, which keeps pointing the namenode to the default tmp/hadoop-${user-name} directory.

source .bashrc et voila! Problem fixed.

Hadoop fs -ls outputs current working directory's files rather than hdfs volume's files

1 Answers1