How to get hadoop put to create directories if they don't exist

Question

I have been using Cloudera's hadoop (0.20.2). With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories:

So for example, if I had no directories in hdfs and typed:

hadoop fs -put myfile.txt /some/non/existing/path/myfile.txt

It would create all of the directories: some, non, existing and path and put the file in there.

Now, with a newer offering of hadoop (2.2.0) this auto creation of directories is not happening. The same command above yields:

put: ` /some/non/existing/path/': No such file or directory

I have a workaround to just do hadoop fs -mkdir first, for every put, but this is not going to perform well.

Is this configurable? Any advice?

>> Why won't it perform well? Because for every 'put' I'm doing a mkdir - which most of the time may not be needed, so it is going to impact performance in high throughput situations. — owly, May 08 '14 at 08:53
Have you considered writing your own solution? I'm surprised `put` performs well at all considering every call has to start a VM, read the configuration, etc... — Mike Park, May 08 '14 at 13:49
Hi, No we haven't, but I guess it is something to consider. I was hoping that there could be an easy solution to this issue (of not creating parent dirs) out of the box. — owly, May 09 '14 at 07:57

score 36 · Answer 1 · edited Jul 06 '15 at 08:09

36

Now you should use hadoop fs -mkdir -p <path>

edited Jul 06 '15 at 08:09

Frederic

3,274
1
21
37

answered Oct 07 '14 at 15:28

art-vybor

409
4
4

13

how this answer the question? – hlagos Mar 06 '17 at 18:03
3

Unfortunately it's not particularly efficient, since JVM needs to spin up for the mkdir command, but the '-p' option does have the nice benefit that it won't error if the directory exists. Makes error handling much cleaner. – Burrito Dec 05 '19 at 20:12

Vijayant · Answer 2 · 2022-01-11T10:45:40.910

3

The put operation does not create the directory if it is not present. We need to create the directory before doing the put operation.

You can use following to create the directory.

hdfs dfs -mkdir -p <path>

-p

It will create parent directory first, if it doesn't exist. But if it already exists, then it will not print an error message and will move further to create sub-directories.

edited Jan 11 '22 at 10:45

answered Jan 04 '22 at 07:32

Vijayant

612
2
6
17

While this code snippet may solve the problem, it doesn't explain why or how it answers the question. Please [include an explanation for your code](//meta.stackexchange.com/q/114762/269535), as that really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion. – Luca Kiebel Jan 10 '22 at 08:49

score 2 · Answer 3 · edited Jul 29 '20 at 20:35

2

EDITORIAL NOTE: WARNING THIS ANSWER IS INDICATED TO BE INCORRECT

hadoop fs ... is deprecated instead use : hdfs dfs -mkdir ....

edited Jul 29 '20 at 20:35

Dennis Jaheruddin

21,208
8
66
122

answered Dec 07 '16 at 09:42

aName

2,751
3
32
61

hadoop dfs -mkdir /mnt/hdfs DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. – Anshul Feb 13 '17 at 15:35

score 1 · Answer 4 · answered Mar 23 '17 at 01:32

Placing a file into a non-extant directory in hdfs requires a two-step process. As @rt-vybor stated, use the '-p' option to mkdir to create multiple missing path elements. But since the OP asked how to place the file into hdfs, the following also performs the hdfs put, and note that you can also (optionally) check that the put succeeded, and conditionally remove the local copy.

First create the relevant directory path in hdfs, and then put the file into hdfs. You want to check that the file exists prior to placing into hdfs. And you may want to log/show that the file has been successfully placed into hdfs. The following combines all the steps.

fn=myfile.txt
if [ -f $fn ] ; then
  bfn=`basename $fn` #trim path from filename
  hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
  hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
  hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn
  success=$? #check whether file landed in hdfs
  if [ $success ] ; then
    echo "remove local copy of file $fn"
    #rm -f $fn #uncomment if you want to remove file
  fi
fi

And you can turn this into a shell script, taking a hadoop path, and a list of files (also only create path once),

#!/bin/bash
hdfsp=${1}
shift;
hdfs dfs -mkdir -p /here/is/some/non/existant/path/in/hdfs/
for fn in $*; do
  if [ -f $fn ] ; then
    bfn=`basename $fn` #trim path from filename
    hdfs dfs -put $fn /here/is/some/non/existant/path/in/hdfs/$bfn
    hdfs dfs -ls /here/is/some/non/existant/path/in/hdfs/$bfn >/dev/null
    success=$? #check whether file landed in hdfs
    if [ $success ] ; then
      echo "remove local copy of file $fn"
      #rm -f $fn #uncomment if you want to remove file
    fi
  fi
done

How to get hadoop put to create directories if they don't exist

4 Answers4