48

why is the output of du often so different from du -b? -b is shorthand for --apparent-size --block-size=1. only using --apparent-size gives me the same result most of the time, but --block-size=1 seems to do the trick. i wonder if the output is then correct even, and which numbers are the ones i want? (i.e. actual filesize, if copied to another storage device)

knittl
  • 246,190
  • 53
  • 318
  • 364
  • 11
    Why the downvote? This looks like a very good question. Please have the courtesy to comment if you're going to downvote a question or an answer so that everyone can learn something. An anonymous downvote is a potential teaching moment thrown away. – Pete Wilson Apr 17 '11 at 16:38
  • 1
    @Pete: probably because this is off topic for StackOverflow. I'm hoping a few more high-reputation users will notice. – Ken Bloom Apr 17 '11 at 20:41
  • Related question on ServerFault: https://serverfault.com/questions/290088/what-is-the-difference-between-du-h-and-ls-lh – GDP2 Feb 25 '19 at 20:22

5 Answers5

45

Apparent size is the number of bytes your applications think are in the file. It's the amount of data that would be transferred over the network (not counting protocol headers) if you decided to send the file over FTP or HTTP. It's also the result of cat theFile | wc -c, and the amount of address space that the file would take up if you loaded the whole thing using mmap.

Disk usage is the amount of space that can't be used for something else because your file is occupying that space.

In most cases, the apparent size is smaller than the disk usage because the disk usage counts the full size of the last (partial) block of the file, and apparent size only counts the data that's in that last block. However, apparent size is larger when you have a sparse file (sparse files are created when you seek somewhere past the end of the file, and then write something there -- the OS doesn't bother to create lots of blocks filled with zeros -- it only creates a block for the part of the file you decided to write to).

Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
  • thanks! that's a thorough explanation. then why do i need to have `--block-size=1` to have the same output as `wc -c theFile` (saving the cat process). looks like `du` only outputs the correct number of bytes, when i specify either -h, -k, -m, -B1 etc.? but maybe that's really another question? `du` by default outputs block usage, not byte usage? – knittl Apr 17 '11 at 16:57
7

Minimal block granularity example

Let's play a bit to see what is going on.

mount tells me I'm on an ext4 partition mounted at /.

I find its block size with:

stat -fc %s .

which gives:

4096

Now let's create some files with sizes 1 4095 4096 4097, and test them with --block-size=1 which is a synonym for -b:

#!/usr/bin/env bash
for size in 1 4095 4096 4097; do
  dd if=/dev/zero of=f bs=1 count="${size}" status=none
  echo "size     ${size}"
  echo "real     $(du --block-size=1 f)"
  echo "apparent $(du --block-size=1 --apparent-size f)"
  echo
done

and the results are:

size     1
real     4096   f
apparent 1      f

size     4095
real     4096   f
apparent 4095   f

size     4096
real     4096   f
apparent 4096   f

size     4097
real     8192   f
apparent 4097   f

So we see that anything below or equal to 4096 takes up 4096 bytes in fact.

Then, as soon as we cross 4097, it goes up to 8192 which is 2 * 4096.

It is clear then that the disk always stores data at a block boundary of 4096 bytes.

What happens to sparse files?

I haven't investigated what is the exact representation is, but it is clear that --apparent does take it into consideration.

This can lead to apparent sizes being larger than actual disk usage.

For example:

dd seek=1G if=/dev/zero of=f bs=1 count=1 status=none
du --block-size=1 f
du --block-size=1 --apparent f

gives:

8192    f
1073741825      f

Related: How to test if sparse file is supported

What to do if I want to store a bunch of small files?

Some possibilities are:

Bibliography:

Tested in Ubuntu 16.04.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • 1
    This is an excellent answer - the illustration of the points with actual commands (what I would call "experimentation") makes the answer and underlying principles very clear. +1 – bballdave025 Feb 20 '22 at 00:19
  • 1
    @bballdave025 thanks! Yes, I'm obsessed with experimentation, [related comment here](https://cirosantilli.com/ciro-santilli-s-bad-old-event-memory). – Ciro Santilli OurBigBook.com Feb 20 '22 at 08:46
  • 1
    That's a beautiful article! It's going to my archive (where I save my data, because I won't remember it, otherwise : ) Your comments remind me of my two friends and I, who went through undergraduate physics together and kept in touch during our grad school physics careers. One friend always said, "Why memorize when you can deduce from first principles?!" It has been one of our themes. If you can't deduce, just experiment! (Oh we also often said, "Time for the bulldozer method!" – bballdave025 Feb 22 '22 at 02:01
  • @bballdave025 ah, you did physics, that's great! One of my great disappointments in life not having gone for it, what I wouldn't give for a good optical table at my disposal right now... so I'm stuck experimenting with things like `du` instead! – Ciro Santilli OurBigBook.com Feb 22 '22 at 09:20
3

Compare (for example) du -bm to du -m.

The -b sets --apparent-size --block-size=1, but then the m overrides the block-size to be 1M.

Similar for -bh versus -h: the -bh means --apparent-size --block-size=1 --human-readable, and again the h overrides that block-size.

eebbesen
  • 5,070
  • 8
  • 48
  • 70
  • If one reads the question carefully, this must be the correct answer. The point must be that `--block-size` is often not `1` by default (i.e., `du` without any option) but rather 1024 or 512. `--apparent-size` is irrelevant side effect of `-b`. – norio Jul 11 '19 at 04:08
  • Thanks for this tip, this is much shorter than `--apparent-size`. – Arnie97 Dec 06 '21 at 06:03
2

Files and folders have their real size and the size on disk.

  • --apparent-size is file or folder real size

  • size on disk is the amount of bytes the file or folder takes on disk. Same thing when using just du.

If you encounter that apparent-size is almost always several magnitudes higher than disk usage then it means that you have a lot of (`sparse') files of files with internal fragmentation or indirect blocks.

hukko
  • 127
  • 2
1

Because by default du gives disk usage, which is the same or larger than the file size. As said under --apparent-size

print apparent sizes, rather than disk usage; although the apparent size is usually smaller, it may be
larger due to holes in (`sparse') files, internal fragmentation, indirect blocks, and the like
Brian Carlton
  • 7,545
  • 5
  • 38
  • 47
  • so what's 'apparent-size' exactly? and i encounter exactly the opposite: apparent-size is almost always several magnitudes higher than disk usage – knittl Apr 17 '11 at 16:34
  • 1
    Actually by default it can be also smaller on partitions with enabled compression. – ARA1307 Jun 27 '18 at 16:40