2

is there any way to check the final size of a specific file after compression in a squashfs filesystem?

I'm looking through mksquashfs/unsquashfs command line options but I can't find anything.

Using the -info option in mksquashfs only prints the size before the compression.

Thanks

tano
  • 836
  • 1
  • 10
  • 25
  • There isn't really a concept of compressed file size at all, because compression happens at a block level, not a file level. Multiple files can be present in the same block (particularly if they're small), and the overhead for the compression algorithm's tables &c. are shared between them. – Charles Duffy Jul 06 '18 at 15:30
  • Thinking about it, everything needed to calculate a rough ratio is there, so this is doable; it's just not something that's there "for free". Are you willing to write some code for this? (If not, the question should be on [unix.se] or [SuperUser](https://superuser.com/), not SO). And if you're willing to write some code, what languages are you handy with? (It's not pretty code, but there *is* a native-Python library for parsing squashfs). And how much accuracy do you need? – Charles Duffy Jul 06 '18 at 15:35
  • @CharlesDuffy thanks for the reply. That's very clear. Yes, maybe SO wasn't the right place for this question. – tano Jul 06 '18 at 15:49
  • Why am I always asking these types of questions!? Thanks folks! Now I'm obsessed with adding this feature! It can only approximate, because I would need to determine how many entries the end up in the dictionary, what other files/blocks may share those entries, then calculate how much of the block it takes... what a mess. – Daniel Santos Mar 09 '22 at 18:49

1 Answers1

5

This isn't feasible to do with much granularity, because compression is done at block level, not file level.

A file may be marked at starting 50kb into the size of the buffer created by decompressing block 50, and continuing to end 50 bytes into the decompressed block 52 (ignoring fragments here, which are a separate concern) -- but that doesn't let you map back to the position inside the compressed copy of block-50 where that file starts. (You can easily determine the compression ratio for block 51, but you can't easily figure out the ratios for the parts of the file contained in 50 and 52 in our example, because they're shared with other contents).

So the information isn't exposed because it isn't easily available. This actually makes storage of numerous (similar) small files significantly more efficient, because a single compression context is used for all of them (and decompressing a block to retrieve one file may mean that you've got files next to it already decompressed in memory)... but without potentially-unfounded assumptions (such as assuming that all contents within a block share that block's average ratio) it doesn't help with trying to backtrace how well each individual item compressed, because the items aren't compressed individually in the first place.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441