1

http://en.wikipedia.org/wiki/Data_compression#Lossless_data_compression

For any given compression scheme, one can provide sample input that would result in no savings in space, right?

Aaron Fi
  • 10,116
  • 13
  • 66
  • 91
  • 1
    Obviously. Examples include compressing already compressed data or random data streams. –  Dec 11 '09 at 19:12
  • Are you referring to general-purpose compression schemes, or abstractly speaking of any compression scheme in any domain? For the former: yes. For the latter, I would argue: no. – Dave Mateer Dec 11 '09 at 19:14
  • general-purpose; I already thought about the "zipping a zip file" example. I was thinking about more straightforward questions of, "if I give you a stream of integers, can you compress them always?" – Aaron Fi Dec 11 '09 at 19:17
  • 2
    A zip file is a stream of integers. –  Dec 11 '09 at 19:19
  • Heh, good point. I should've just leapfrogged to the "bitstream" point of view. – Aaron Fi Dec 11 '09 at 19:21
  • Dupes:http://stackoverflow.com/questions/1513567/theory-compression-algorithm-that-makes-some-files-smaller-but-none-bigger http://stackoverflow.com/questions/1166385/how-many-times-a-file-be-compressed – Dan Hook Dec 11 '09 at 21:04

5 Answers5

5

Yes, there's always something that will grow larger. The pigeonhole principle says that if you have a space of inputs, and a 1-to-1 function (the lossless compression), then the number of outputs has to be the same as the number of inputs.

If the inputs are files of N bits, then the number of inputs is 2**N, and the number of outputs is 2**N. You can't store that many different outputs in files all shorter than N bits.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
  • doesn't that imply all lossless compression is useless in general? – Aaron Fi Dec 11 '09 at 19:15
  • Theoretically it's not necessary for anything to be larger. But we can guarantee that there are inputs that are at least as large. – recursive Dec 11 '09 at 19:15
  • 2
    Of course, if no input gets larger, then no input can get smaller either. – Anon. Dec 11 '09 at 19:17
  • number of inputs would be 2**(8*N) inputs and outputs. (8 bits per byte.) – retracile Dec 11 '09 at 19:19
  • 1
    Not at all: they work really well on typical inputs. Just because a few files are larger doesn't mean that the average isn't much smaller. – Ned Batchelder Dec 11 '09 at 19:22
  • oops: yes, 2**(8*N), I fixed it. – Ned Batchelder Dec 11 '09 at 19:22
  • @aaron -- the idea is that you cleverly choose which files get bigger and which get smaller. You choose unusual data to be the sacrificial victims that get bigger, and more common data chunks as the ones which are better compressed. – Alex Feinman Dec 11 '09 at 19:26
  • Note that out of 2^(2^30) 1GB bitstreams only some represent useful data, for example movies. Almost all of these bitstreams are simply random noice for us. Lossless compression makes the useful bitstreams smaller at the cost that other less useful bitstreams. – liori Dec 11 '09 at 19:26
4

For any given compression scheme, one can provide sample input that would result in no savings in space, right?

Yes: A single bit.

Ben S
  • 68,394
  • 30
  • 171
  • 212
2

Absolutely.

If it wasn't, you could conceivably run the output of the compression into the compressor again ad infinium for better compression until you get all the way to a single bit. That's obviously impossible.

whatsisname
  • 5,872
  • 2
  • 20
  • 27
1

Correct. Try zipping a zip file ... if the data is already compressed, you won't be able to get further compression.

Beep beep
  • 18,873
  • 12
  • 63
  • 78
0

"if I give you a stream of integers, can you compress them always?"

In the "zipping a zipfile" example, why are you thinking of bytes in the zipfile as something other than a stream of integers?

That was a pretty concise example of an instance when you could "defeat" lossless data compression.

drakaan
  • 37
  • 2