I know du -sh
in common Linux filesystems. But how to do that with HDFS?

- 9,904
- 3
- 37
- 61

- 4,816
- 4
- 41
- 44
12 Answers
Prior to 0.20.203, and officially deprecated in 2.6.0:
hadoop fs -dus [directory]
Since 0.20.203 (dead link) 1.0.4 and still compatible through 2.6.0:
hdfs dfs -du [-s] [-h] URI [URI …]
You can also run hadoop fs -help
for more info and specifics.

- 3,055
- 1
- 18
- 17
-
28-du -s (-dus is deprecated) – Carlos Rendon Jan 03 '13 at 22:11
hadoop fs -du -s -h /path/to/dir
displays a directory's size in readable form.

- 11,184
- 1
- 38
- 48
-
For newer versions of hdfs, `hdfs -du -s -h /path/to/dir` it's more appropriate. – Adelson Araújo Nov 05 '19 at 18:42
Extending to Matt D and others answers, the command can be till Apache Hadoop 3.0.0
hadoop fs -du [-s] [-h] [-v] [-x] URI [URI ...]
It displays sizes of files and directories contained in the given directory or the length of a file in case it's just a file.
Options:
- The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files. Without the -s option, the calculation is done by going 1-level deep from the given path.
- The -h option will format file sizes in a human-readable fashion (e.g 64.0m instead of 67108864)
- The -v option will display the names of columns as a header line.
- The -x option will exclude snapshots from the result calculation. Without the -x option (default), the result is always calculated from all INodes, including all snapshots under the given path.
du
returns three columns with the following format:
+-------------------------------------------------------------------+
| size | disk_space_consumed_with_all_replicas | full_path_name |
+-------------------------------------------------------------------+
Example command:
hadoop fs -du /user/hadoop/dir1 \
/user/hadoop/file1 \
hdfs://nn.example.com/user/hadoop/dir1
Exit Code: Returns 0 on success and -1 on error.

- 34,112
- 13
- 125
- 125
-
+1 for the information about the results! I didn't understand why I was getting two results (size and disk_space) instead of one. Thanks! – Ric S Mar 02 '21 at 13:26
-
With this you will get size in GB
hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3)) " [GB]\t" $2 }'

- 179,855
- 19
- 132
- 245

- 734
- 1
- 10
- 27
-
1hdfs dfs -du PATHTODIRECTORY | awk '/^[0-9]+/ { print int($1/(1024**3) " [GB]\t" $2 }' - Please update your command. Two closing bracket after 1024**3. It should be only 1 – gubs Sep 14 '18 at 14:40
When trying to calculate the total of a particular group of files within a directory the -s
option does not work (in Hadoop 2.7.1). For example:
Directory structure:
some_dir
├abc.txt
├count1.txt
├count2.txt
└def.txt
Assume each file is 1 KB in size. You can summarize the entire directory with:
hdfs dfs -du -s some_dir
4096 some_dir
However, if I want the sum of all files containing "count" the command falls short.
hdfs dfs -du -s some_dir/count*
1024 some_dir/count1.txt
1024 some_dir/count2.txt
To get around this I usually pass the output through awk.
hdfs dfs -du some_dir/count* | awk '{ total+=$1 } END { print total }'
2048

- 15,553
- 7
- 65
- 85
The easiest way to get the folder size in a human readable format is
hdfs dfs -du -h /folderpath
where -s
can be added to get the total sum

- 2,630
- 24
- 30
To get the size of the directory hdfs dfs -du -s -h /$yourDirectoryName can be used. hdfs dfsadmin -report can be used to see a quick cluster level storage report.

- 920
- 1
- 11
- 12
-
The -s did the trick, otherwise, it gave me a full list of files which I then have to tally up. – Hein du Plessis Jul 01 '21 at 19:53
% of used space on Hadoop cluster
sudo -u hdfs hadoop fs –df
Capacity under specific folder:
sudo -u hdfs hadoop fs -du -h /user

- 4,843
- 8
- 35
- 55

- 409
- 5
- 8
-
I got an error with "hdfs", the way it worked for me was: `hadoop fs -du -h /user` (i didn't need to use `sudo`) – diens Jan 04 '19 at 15:23
-
hdfs dfs -count <dir>
info from man page:
-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ... :
Count the number of directories, files and bytes under the paths
that match the specified file pattern. The output columns are:
DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
or, with the -q option:
QUOTA REM_QUOTA SPACE_QUOTA REM_SPACE_QUOTA
DIR_COUNT FILE_COUNT CONTENT_SIZE PATHNAME
Incase if someone is need through pythonic way :)
Install
hdfs
python packagepip install hdfs
code
from hdfs import InsecureClient client = InsecureClient('http://hdfs_ip_or_nameservice:50070',user='hdfs') folder_info = client.content("/tmp/my/hdfs/path") #prints folder/directory size in bytes print(folder_info['length'])

- 886
- 9
- 14
Command Should be hadoop fs -du -s -h \dirPath
-du [-s] [-h] ... : Show the amount of space, in bytes, used by the files that match the specified file pattern.
-s : Rather than showing the size of each individual file that matches the
pattern, shows the total (summary) size.-h : Formats the sizes of files in a human-readable fashion rather than a number of bytes. (Ex MB/GB/TB etc)
Note that, even without the -s option, this only shows size summaries one level deep into a directory.
The output is in the form size name(full path)

- 2,135
- 26
- 27