0

I am trying to use command hdfs dfs - du -h to list the size of files and folders. The command I use is hdfs dfs - du -h /path_name/folder_name, the result returned is like

9.2 G   27.5 G  /path_name/folder_name/xxx01.parquet
0       0       /path_name/folder_name/xxx02.parquet
19.9 M  59.6 M  /path_name/folder_name/xxx03.parquet

I know the hadoop command line is borrowing a lot from general file system command, and -du -h is to list a human readable folder/file size. However, (take the first result line as an example ) what is the meaning for these two numbers 9.2 G 27.5 G respectively?

Thanks!

helloworld
  • 613
  • 8
  • 24

1 Answers1

2

Your cluster replication factor is 3. The first number is the file pure size and the second one is the file size with repicas. for example actual file size is 9.2 GB. Because replication factor is 3 the file size with replicas is 27.5GB

size      disk space consumed with all replicas full_path
Rahim Dastar
  • 1,259
  • 1
  • 9
  • 15