0

I'm trying to compress some string with PHP but i have some strange results.

I've tried this code it was found here

$string = str_repeat('1234567890'.implode('',range('a','z')),48800);    
echo strlen($string);//1756800 bytes    
$start = microtime(true);
$compressed = gzdeflate($string,  9);
$compressed = gzdeflate($compressed, 9);
$end = microtime(true);
var_dump($compressed);
echo '<br/>'.strlen($compressed).'<br/>';//99 bytes    
$star2 = microtime(true);
echo gzinflate(gzinflate($compressed));
$end2 = microtime(true);    
echo '</br>- '.($end-$start);
echo '</br>- '.($end2-$star2);

This return great results, like 1756800 bytes become 99 bytes. It was good enough.

But when i bring this solution for the real world with real strings, i've tried to compress a 3606 bytes string and it become just 1765 bytes, isn't good enought.

Why this? Some character can change the result?

I've try this code to get the result of 50% reduced size:

$text = file_get_contents ('doc/test.txt');
$xml = preg_split('/>/',$text , null, PREG_SPLIT_DELIM_CAPTURE);
unset($xml[0]);
unset($xml[1]);
foreach($xml as $p){
    $compact = gzdeflate($p,9);
    $compact = gzdeflate($compact,9);
    var_dump(strlen($compact));
    var_dump(strlen($p));
}    

I've got some results like this:

int(1760) < compressed
int(3606) < normal
int(2441) < compressed
int(5878) < normal
Community
  • 1
  • 1
Guerra
  • 2,792
  • 1
  • 22
  • 32
  • some text values/data types are more compressible than others so the results don't look really odd. What you were seeing earlier was probably a best case scenario. – Maximus2012 Sep 20 '13 at 17:30

2 Answers2

1

Not all data can be compressed equally well: text with repeating words and recognisable patterns is easier to compress than a random sequence of bytes that you might find in a binary file for example. Without knowing the origin of your data, compression to 50-30% sounds pretty good.

You should know that compressing data does not always save space; the result may even be longer than the original.

Joni
  • 108,737
  • 14
  • 143
  • 193
1

All depends on the compression algorithm - some will compress slower and will result better compression.

Also, the type of data being compressed with affect your results. Lots of repeated chars in the original file will compress and create a smaller compressed file.

Read more here about the results from different compression methods: http://en.wikipedia.org/wiki/Data_compression

Dave Walker
  • 122
  • 3