8

I've been looking on the internets and couldn't find an LZW decompression implementation in PHP that works with the data outputted by these javascript functions:

function lzw_encode(s) {
    var dict = {};
    var data = (s + "").split("");
    var out = [];
    var currChar;
    var phrase = data[0];
    var code = 256;
    for (var i=1; i<data.length; i++) {
        currChar=data[i];
        if (dict[phrase + currChar] != null) {
            phrase += currChar;
        }
        else {
            out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
            dict[phrase + currChar] = code;
            code++;
            phrase=currChar;
        }
    }
    out.push(phrase.length > 1 ? dict[phrase] : phrase.charCodeAt(0));
    for (var i=0; i<out.length; i++) {
        out[i] = String.fromCharCode(out[i]);
    }
    return out.join("");
}

function lzw_decode(s) {
    var dict = {};
    var data = (s + "").split("");
    var currChar = data[0];
    var oldPhrase = currChar;
    var out = [currChar];
    var code = 256;
    var phrase;
    debugger;
    for (var i=1; i<data.length; i++) {
        var currCode = data[i].charCodeAt(0);
        if (currCode < 256) {
            phrase = data[i];
        }
        else {
           phrase = dict[currCode] ? dict[currCode] : (oldPhrase + currChar);
        }
        out.push(phrase);
        currChar = phrase.charAt(0);
        dict[code] = oldPhrase + currChar;
        code++;
        oldPhrase = phrase;
    }
    return out.join("");
}

I really just need a decompression algorithm in PHP that can work with the compression javascript function above.

The lzw_encode function above encodes "This is a test of the compression function" as "This Ă a test ofĈhe comprĊsion functěn"

The libraries I've found are either buggy (http://code.google.com/p/php-lzw/) or don't take input of UTC characters.

Any help would be greatly appreciated,

Thanks!

xd44
  • 831
  • 3
  • 9
  • 15
  • 1
    Why not use the JS from [link](http://rosettacode.org/wiki/LZW_compression#JavaScript)? There are ready made PHP implementations for that online. Eg: [link](http://webdevwonders.com/lzw-compression-and-decompression-with-javascript-and-php/). – BogdanM Sep 25 '13 at 12:14
  • Why is i=1 in here: `for (var i=1; i – BogdanM Sep 25 '13 at 12:18

2 Answers2

3

I've ported and tested it for you to PHP:

function lzw_decode($s) {
  mb_internal_encoding('UTF-8');

  $dict = array();
  $currChar = mb_substr($s, 0, 1);
  $oldPhrase = $currChar;
  $out = array($currChar);
  $code = 256;
  $phrase = '';

  for ($i=1; $i < mb_strlen($s); $i++) {
      $currCode = implode(unpack('N*', str_pad(iconv('UTF-8', 'UTF-16BE', mb_substr($s, $i, 1)), 4, "\x00", STR_PAD_LEFT)));
      if($currCode < 256) {
          $phrase = mb_substr($s, $i, 1);
      } else {
         $phrase = $dict[$currCode] ? $dict[$currCode] : ($oldPhrase.$currChar);
      }
      $out[] = $phrase;
      $currChar = mb_substr($phrase, 0, 1);
      $dict[$code] = $oldPhrase.$currChar;
      $code++;
      $oldPhrase = $phrase;
  }
  var_dump($dict);
  return(implode($out));
}
clover
  • 4,910
  • 1
  • 18
  • 26
0

There is now a PHP extension for this!

lzw_decompress_file('3240_05_1948-1998.tar.Z', '3240_05_1948-1998.tar');
$archive = new PharData('/tmp/3240_05_1948-1998.tar');
mkdir('unpacked');
$archive->extractTo('unpacked');
quickshiftin
  • 66,362
  • 10
  • 68
  • 89
  • That is nice, but might be a bit hard to use, since it forces the use of files. It cannot just decompress a string of data. It also requires a proper .Z file format (those 3 bytes at the beginning). – Veda Feb 23 '16 at 21:04
  • It's open source and it's a half day's worth of effort... I'll plan to add support for strings if there is some need expressed by the community, or feel free to send me a pull request ;) By the way, it's currently the best option available for PHP users wanting LZW compression, even if it forces you to use files. – quickshiftin Feb 23 '16 at 21:07