3

I'm trying to evaluate the disk usage in our system. In particular, we have an infinidat storage, and we have to know what will be the size our data in a non-infinidat storage. I got the following three different answers:

1) By typing 'du -sh' I get 154T.

2) By typing 'du -sh --apparent-size' I get 212T.

3) By running a recursive python script based on the results of 'os.stat().st_size' I get 304T.

I understand the differences might be due to sparse files, block size and maybe some compression the system does (infinidat storage). But still, any ideas how can I evaluate the size of the data on a different storage?

More details about some specific case: The system contains many 'sam' files, which store bioinformatics data. The command 'ls' show the size of the file "x.sam", for example, is 12G. By running 'du -sh x.sam' I get 2.5G, and by running 'du -sh --apparent-size x.sam' I get 12G. So, what is the real size of the file? What will be the size of this file in a new Dell-based storage?

GSB
  • 31
  • 3
  • Could be a possible duplicate of https://stackoverflow.com/questions/5694741/why-is-the-output-of-du-often-so-different-from-du-b/5694854#5694854 – Mortz Dec 05 '18 at 12:27
  • Have you asked Infinidat what numbers you should use? They should be able to tell you, and then you can deduce the average compression ratio (average, since it will of course be different for each file, depending on the type of file and data inside). – 9769953 Dec 05 '18 at 12:48
  • Yes, I asked Infinidata. They suggested to use the results of --apparent-size, but the differences between ~200T and ~300T are too big, so I'll be happy if there is someone with some more insights. – GSB Dec 05 '18 at 13:44

0 Answers0