4

I'm just trying to write a really basic script that'll take some input text and compress it with lzw, using this package: http://packages.python.org/lzw/

I've never tried any encoding with python before and am thoroughly confused =( - I also can't find any documentation online about it, other than the package info.

Here's what I have:

import lzw

file = lzw.readbytes("collectemailinfo.txt", buffersize=1024)
enc = lzw.compress(file)
print enc

Any help or pointers of any kind would be much appreciated!

Thanks =)

sophanox
  • 39
  • 1
  • 1
  • 3
  • Is something wrong with what you have? You probably can't print it and expect it to be human readable, but you can save it somewhere. Your code looks correct to me. – Jordan Jul 26 '11 at 18:01
  • ah right, thanks - i did try removing the "print enc" and replacing it with "lzw.writebytes(output.txt, enc)" but had no joy with that either =( – sophanox Jul 26 '11 at 18:04

1 Answers1

12

Here is the package API: http://packages.python.org/lzw/lzw-module.html

You can read the pseudo code for compression and decompression here.

Is there anything else you are confused about?

Here is an example:

Python

In this version the dicts contain mixed-typed data:

def compress(uncompressed):
    """Compress a string to a list of output symbols."""

    # Build the dictionary.
    dict_size = 256
    dictionary = dict((chr(i), chr(i)) for i in xrange(dict_size))
    # in Python 3: dictionary = {chr(i): chr(i) for i in range(dict_size)}

    w = ""
    result = []
    for c in uncompressed:
        wc = w + c
        if wc in dictionary:
            w = wc
        else:
            result.append(dictionary[w])
            # Add wc to the dictionary.
            dictionary[wc] = dict_size
            dict_size += 1
            w = c

    # Output the code for w.
    if w:
        result.append(dictionary[w])
    return result



def decompress(compressed):
    """Decompress a list of output ks to a string."""

    # Build the dictionary.
    dict_size = 256
    dictionary = dict((chr(i), chr(i)) for i in xrange(dict_size))
    # in Python 3: dictionary = {chr(i): chr(i) for i in range(dict_size)}

    w = result = compressed.pop(0)
    for k in compressed:
        if k in dictionary:
            entry = dictionary[k]
        elif k == dict_size:
            entry = w + w[0]
        else:
            raise ValueError('Bad compressed k: %s' % k)
        result += entry

        # Add w+entry[0] to the dictionary.
        dictionary[dict_size] = w + entry[0]
        dict_size += 1

        w = entry
    return result

How to use:

compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
decompressed = decompress(compressed)
print (decompressed)

Output:

['T', 'O', 'B', 'E', 'O', 'R', 'N', 'O', 'T', 256, 258, 260, 265, 259, 261, 263]
TOBEORNOTTOBEORTOBEORNOT

NOTE: this example is taken from here.

dda
  • 6,030
  • 2
  • 25
  • 34
Saher Ahwal
  • 9,015
  • 32
  • 84
  • 152
  • Thanks for the help, yeah I've been been trying to use the info from the package API site, but failing so far =( With regards to the second link, should I just translate the pseudo code into python and use that? – sophanox Jul 26 '11 at 18:08
  • I'll have to properly read through it and try to understand what's going on, it's obviously far complicated than I initially thought - i thought a simple 5-6 line script would cover it! I shall have a go now, thanks a lot saher! – sophanox Jul 26 '11 at 18:12
  • no problem. Glad I could help. I can also help you understand the code if you have some issues with it. – Saher Ahwal Jul 26 '11 at 18:19