Image compression in python

Question

For my image Compression, I am using the pillow library to get every pixel in rgb (for ex: (100, 0, 200). Using the Huffman encoding I already convert to binary to reduce the number of bits. For now, I have to save the sequence of bits into a text or binary file. The compress files to be consistently smaller than original, but for now, my txt file is larger than the original. What should I do ? And after that how can I read the file and decompress it. Here is an instruction:

Your code should read in an image file, compute how many bits are required for a fixed length encoding and then apply a compression algorithm to create a smaller encoding – you need to implement the compression, you cannot use a compression library. You should output how many bits are required to store the image in your compressed format as well as the compression ratio achieved. When it comes to saving your compressed image, you won’t be able to save it as a standard image format, since you will have created your own encoding, but you can save the sequence of bits into a text or binary file.

Your code should also be able to prompt the user for the filename of a text file containing a compressed sequence of bits and then decompress that file into the original image – you can assume that the file uses the same compression format as the last file you compressed. So, for example, if you compressed pacificat.bmp into a series of bits stored in pacificat.txt and then the user asked you to decompress alt_encode.txt, you could assume that alt_pacificat.txt used the same compression data structure as encode.txt (it might be a subset of the data from the original image, for example).

There are a number of libraries that can help you store formatted data into a file from Python. If you research the options and find a way to store your compression data structure into a file, such that the user can select both a bit file and a data structure file and use the data structure to decompress the bit file

just use my current image: flag2.bmp

here is my code

from PIL import  Image
import sys, string
import copy
import time


codes   = {}
def sortFreq (freqs) :
    letters = freqs.keys()
    tuples = []
    for let in letters :
        tuples.append((freqs[let],let))
    tuples.sort()
    return tuples

def buildTree(tuples) :
    while len(tuples) > 1 :
        leastTwo = tuple(tuples[0:2])                  # get the 2 to combine
        theRest  = tuples[2:]                          # all the others
        combFreq = leastTwo[0][0] + leastTwo[1][0]     # the branch points freq
        tuples   = theRest + [(combFreq,leastTwo)]     # add branch point to the end
        tuples.sort()                                  # sort it into place
    return tuples[0]            # Return the single tree inside the list

def trimTree (tree) :
     # Trim the freq counters off, leaving just the letters
    p = tree[1]                                    # ignore freq count in [0]
    if type(p) == type("") : return p              # if just a leaf, return it
    else : return (trimTree(p[0]), trimTree(p[1])) # trim left then right and recombine

def assignCodes(node, pat=''):
    global codes
    if type(node) == type("") :
        codes[node] = pat                # A leaf. set its code
    else  :                              #
        assignCodes(node[0], pat+"0")    # Branch point. Do the left branch
        assignCodes(node[1], pat+"1")    # then do the right branch.


start = time.time()
dictionary = {}
table = {}
image = Image.open('flag2.bmp')
#image.show()
width, height = image.size
px= image.load()

totalpixel = width*height
print("Total pixel: "+ str(totalpixel))

for x in range(width):
    for y in range(height):
       # print(px[x, y])
        for i in range(3):

            if dictionary.get(str(px[x, y][i])) is None:
                dictionary[str(px[x, y][i])] = 1
            else:
                dictionary[str(px[x, y][i])] = dictionary[str(px[x, y][i])] +1
table = copy.deepcopy(dictionary)

def encode2 (str) :
    global codes
    output = ""
    for ch in str : output += codes[ch]
    return output

def decode (tree, str) :
    output = ""
    p = tree
    for bit in str :
        if bit == '0' : p = p[0]     # Head up the left branch
        else          : p = p[1]     # or up the right branch
        if type(p) == type("") :
            output += p              # found a character. Add to output
            p = tree                 # and restart for next character
    return output

combination = len(dictionary)
for value in table:
    table[value] = table[value] / (totalpixel * combination) * 100
print(table)

print(dictionary)
sortdic = sortFreq(dictionary)

tree = buildTree(sortdic)
print("tree")
print(tree)
trim = trimTree(tree)
print("trim")
print(trim)
print("assign 01")
assignCodes(trim)
print(codes)
empty_tuple = ()
f = open("answer.txt","w")

for x in range(width):
    for y in range(height):
        list = []
        list.append(codes[str(px[x, y][0])])
        list.append(codes[str(px[x, y][1])])
        list.append(codes[str(px[x, y][2])])
        print(str(px[x, y]) + ": " +str(list))
        f.write(str(list))

print("decode test:", str(decode (trim, "1100")))


stop = time.time()
times = (stop - start) * 1000
print("Run time takes %d miliseconds" % times)

[flag2.bmp][1]

Raw RGB pixel is represented by 3 bytes (24 bits). Your algorithm encodes the first pixel as the following ASCII string: `['000', '0010', '0011']` -- 23 **bytes**. The 2 spaces are outright useless. Since data is just the 0s and 1s, the 6 apostrophes are redundant. Since you're writing prefix codes, the commas and brackets are also redundant. In total, that's 12 bytes per pixel that carry no information at all. The remaining 11 bytes (in this case) carry some information... but how much? If you notice, the only two possible symbols in the output alphabet are 0 and 1. ... — Dan Mašek, Nov 17 '19 at 14:08
That means each symbol carries 1 bit of information. Since you store each symbol as ASCII character, you use 8 bits for each 1 bit of information. Put together, in case, you used 184 bits to represent 11 bits of information -- ~16.7x more than necessary. | You need to store the compressed output as a bitstream. — Dan Mašek, Nov 17 '19 at 14:16
Some further considerations once you solve the encoding inefficiency: A) Try treating each channel independently (separate Huffman tables for red, green and blue). In this case there are only 2 distinct intensities per channel -- i.e. you could code each channel using only 1 bit, for total of 3 bits per pixel. B) There are very few distinct colours in the image (in fact, just 2). You could encode it as a palette-based image. That would mean 1 bit per pixel + some overhead to store the palette. — Dan Mašek, Nov 17 '19 at 14:26
Oh, and one more thing -- currently, your compressed file only hold the codes for the individual pixel values. However, to be able to reconstruct it, there's some additional information necessary. First of all, you need to store the size (width and height) of the image. Second, to decode the compressed stream, you need the Huffman trees -- so you either need to store the trees themselves, or whatever information is necessary to re-generate them on the decoder side. — Dan Mašek, Nov 17 '19 at 18:00

Dan Mašek · Accepted Answer · 2019-11-18T02:57:21.953

Code Cleanup

Let's try to refactor your code a little, taking advantage of algorithms provided by Python standard library, while keeping to the spirit of your approach to Huffman tree calculation and image encoding.

Calculating Symbol Counts

First of all, we can refactor the symbol counting into a function and rewrite it in more concise way:

Use Image.getdata() to iterate over all the pixels in the image
Since each pixel is represented by a tuple, use itertools.chain.from_iterable to get a flattened view of intensities.
Take advantage of collections.Counter to get the symbol (intensity counts)

Additionally, we can change it to return a list of (symbol, count), sorted in ascending order by (count, symbol). To do so, we can combine it with a rewritten version of your sortFreq(...) function, taking advantage of:

Python sorted(...) function (which allows us to define the key to sort by), together with
Tuple slicing to reverse the (symbol, count) tuples for sorting

Implementation:

from collections import Counter
from itertools import chain

def count_symbols(image):
    pixels = image.getdata()
    values = chain.from_iterable(pixels)
    counts = Counter(values).items()
    return sorted(counts, key=lambda x:x[::-1])

Building the Tree

Only a small change is needed here -- since we already have the symbol counts sorted, we just need to reverse the tuples to let your existing tree-building algorithm to work. We can use list comprehension together with tuple slicing to express this concisely.

Implementation:

def build_tree(counts) :
    nodes = [entry[::-1] for entry in counts] # Reverse each (symbol,count) tuple
    while len(nodes) > 1 :
        leastTwo = tuple(nodes[0:2]) # get the 2 to combine
        theRest = nodes[2:] # all the others
        combFreq = leastTwo[0][0] + leastTwo[1][0]  # the branch points freq
        nodes = theRest + [(combFreq, leastTwo)] # add branch point to the end
        nodes.sort() # sort it into place
    return nodes[0] # Return the single tree inside the list

Trimming the Tree

Again, just two small changes from your original implementation:

Change the test to check for tuple (node), to be independent of how a symbol is represented.
Get rid of the unnecessary else

Implementation:

def trim_tree(tree) :
    p = tree[1] # Ignore freq count in [0]
    if type(p) is tuple: # Node, trim left then right and recombine
        return (trim_tree(p[0]), trim_tree(p[1]))
    return p # Leaf, just return it

Assigning Codes

The most important change here is to eliminate the reliance on a global codes variable. To resolve it, we can split the implementation into two functions, one which handles the recursive code assignment, and a wrapper which creates a new local codes dictionary, dispatches the recursive function on it, and returns the output.

Let's also switch the representation of codes from strings to lists of bits (integers in range [0,1]) -- the usefulness of this will be apparent later.

Once more, we'll change the test to check for tuples (for same reason as when trimming).

Implementation:

def assign_codes_impl(codes, node, pat):
    if type(node) == tuple:
        assign_codes_impl(codes, node[0], pat + [0]) # Branch point. Do the left branch
        assign_codes_impl(codes, node[1], pat + [1]) # then do the right branch.
    else:
        codes[node] = pat # A leaf. set its code

def assign_codes(tree):
    codes = {}
    assign_codes_impl(codes, tree, [])
    return codes

Encoding

Let's make a small detour, and talk about encoding of the data.

First of all, let's observe that a raw RGB pixel is represented by 3 bytes (one for each colour channel. That's 24 bits per pixel, and forms our baseline.

Now, your current algorithm encodes the first pixel as the following ASCII string:

['000', '0010', '0011']

That's 23 bytes in total (or 184 bits). That's much, much worse than raw. Let's examine why:

There are two spaces, which just make it more readable to a human. Those carry no information. (2 bytes)
Each of the three codes is delimited by two apostrophes. Since the codes only consist of 0s and 1s, the apostrophes are unnecessary for parsing, and thus also carry no information. (6 bytes)
Each of the codes is a prefix code, therefore they can be parsed unambiguously, and thus the two commas used for code separation are also unnecessary. (2 bytes)
We know there are three codes per pixel, so we don't need the braces ([,]) to delimit pixels either (for same reason as above). (2 bytes)

In total, that's 12 bytes per pixel that carry no information at all. The remaining 11 bytes (in this particular case) do carry some information... but how much?

Notice that the only two possible symbols in the output alphabet are 0 and 1. That means that each symbol carries 1 bit of information. Since you store each symbol as ASCII character (a byte), you use 8 bits for each 1 bit of information.

Put together, in this particular case, you used 184 bits to represent 11 bits of information -- ~16.7x more than necessary, and ~7.67x worse than just storing the pixels in raw format.

Obviously, using a naive text representation of the encoded data will not yield any compression. We will need a better approach.

Bitstreams

From our earlier analysis, it becomes evident that in order to perform compression (and decompression) effectively, we need to be able to treat our output (or input) as a stream of individual bits. The standard Python libraries do not provide a direct solution to do this -- at the lowest granularity, we can only read or write a file one byte at a time.

Since we want to encode values that may consist of multiple bits, it's essential to decode on how they shall be ordered based on significance. Let's order them from the most significant to the least significant.

Bit I/O Utilities

As mentioned earlier, we shall represent a sequence of bits as a list of integers in range [0,1]. Let's start by writing some simple utility functions:

A function that converts an integer into the shortest sequence of bits that uniquely represents it (i.e. at least 1 bit, but otherwise no leading zeros).
A function that converts a sequence of bits into an integer.
A function that zero-extends (adds zeros to most significant positions) a sequence of bits (to allow fixed-length encoding).

Implementation:

def to_binary_list(n):
    """Convert integer into a list of bits"""
    return [n] if (n <= 1) else to_binary_list(n >> 1) + [n & 1]

def from_binary_list(bits):
    """Convert list of bits into an integer"""
    result = 0
    for bit in bits:
        result = (result << 1) | bit
    return result

def pad_bits(bits, n):
    """Prefix list of bits with enough zeros to reach n digits"""
    assert(n >= len(bits))
    return ([0] * (n - len(bits)) + bits)

Example Usage:

>>> to_binary_list(14)
[1, 1, 1, 0]
>>> from_binary_list([1,1,1,0])
14
>>> pad_bits(to_binary_list(14),8)
[0, 0, 0, 0, 1, 1, 1, 0]

Output Bitstream

Since the file I/O API allows us to save only whole bytes, we need to create a wrapper class that will buffer the bits written into a stream in memory.

Let's provide means to write a single bit, as well as a sequence of bits.

Each write command (of 1 or more bits) will first add the bits into the buffer. Once the buffer contains more than 8 bits, groups of 8 bits are removed from the front, converted to an integer in range [0-255] and saved to the output file. This is done until the buffer contains less than 8 bits.

Finally, let's provide a way to "flush" the stream -- when the buffer is non-empty, but doesn't contain enough bits to make a whole byte, add zeros to the least significant position until there are 8 bits, and then write the byte. We need this when we're closing the bitstream (and there are some other benefits that we'll see later).

Implementation:

class OutputBitStream(object): 
    def __init__(self, file_name): 
        self.file_name = file_name
        self.file = open(self.file_name, 'wb') 
        self.bytes_written = 0
        self.buffer = []

    def write_bit(self, value):
        self.write_bits([value])

    def write_bits(self, values):
        self.buffer += values
        while len(self.buffer) >= 8:
            self._save_byte()        

    def flush(self):
        if len(self.buffer) > 0: # Add trailing zeros to complete a byte and write it
            self.buffer += [0] * (8 - len(self.buffer))
            self._save_byte()
        assert(len(self.buffer) == 0)

    def _save_byte(self):
        bits = self.buffer[:8]
        self.buffer[:] = self.buffer[8:]

        byte_value = from_binary_list(bits)
        self.file.write(bytes([byte_value]))
        self.bytes_written += 1

    def close(self): 
        self.flush()
        self.file.close()

Input Bitstream

Input bit stream follows similar theme. We want to read 1 or more bits at a time. To do so, we load bytes from the file, convert each byte to a list of bits and add it to the buffer, until there are enough to satisfy the read request.

The flush command in this case purges the buffer (assuring it contains only zeros).

Implementation:

class InputBitStream(object): 
    def __init__(self, file_name): 
        self.file_name = file_name
        self.file = open(self.file_name, 'rb') 
        self.bytes_read = 0
        self.buffer = []

    def read_bit(self):
        return self.read_bits(1)[0]

    def read_bits(self, count):
        while len(self.buffer) < count:
            self._load_byte()
        result = self.buffer[:count]
        self.buffer[:] = self.buffer[count:]
        return result

    def flush(self):
        assert(not any(self.buffer))
        self.buffer[:] = []

    def _load_byte(self):
        value = ord(self.file.read(1))
        self.buffer += pad_bits(to_binary_list(value), 8)
        self.bytes_read += 1

    def close(self): 
        self.file.close()

Compressed Format

Next we need to define the format of our compressed bitstream. There are three essential chunks of information that are needed to decode the image:

The shape of the image (height and width), with the assumption that it's a 3-channel RGB image.
Information necessary to reconstruct the Huffman codes on the decode side
Huffman-encoded pixel data

Let's make our compressed format as follows:

Header
- Image height (16 bits, unsigned)
- Image width (16 bits, unsigned)
Huffman table (beginning aligned to whole byte)
- See this for the algorithm.
Pixel codes (beginning aligned to whole byte)
- width * height * 3 Huffman codes in sequence

Compression

Implementation:

from PIL import Image

def compressed_size(counts, codes):
    header_size = 2 * 16 # height and width as 16 bit values

    tree_size = len(counts) * (1 + 8) # Leafs: 1 bit flag, 8 bit symbol each
    tree_size += len(counts) - 1 # Nodes: 1 bit flag each
    if tree_size % 8 > 0: # Padding to next full byte
        tree_size += 8 - (tree_size % 8)

    # Sum for each symbol of count * code length
    pixels_size = sum([count * len(codes[symbol]) for symbol, count in counts])
    if pixels_size % 8 > 0: # Padding to next full byte
        pixels_size += 8 - (pixels_size % 8)

    return (header_size + tree_size + pixels_size) / 8

def encode_header(image, bitstream):
    height_bits = pad_bits(to_binary_list(image.height), 16)
    bitstream.write_bits(height_bits)    
    width_bits = pad_bits(to_binary_list(image.width), 16)
    bitstream.write_bits(width_bits)

def encode_tree(tree, bitstream):
    if type(tree) == tuple: # Note - write 0 and encode children
        bitstream.write_bit(0)
        encode_tree(tree[0], bitstream)
        encode_tree(tree[1], bitstream)
    else: # Leaf - write 1, followed by 8 bit symbol
        bitstream.write_bit(1)
        symbol_bits = pad_bits(to_binary_list(tree), 8)
        bitstream.write_bits(symbol_bits)

def encode_pixels(image, codes, bitstream):
    for pixel in image.getdata():
        for value in pixel:
            bitstream.write_bits(codes[value])

def compress_image(in_file_name, out_file_name):
    print('Compressing "%s" -> "%s"' % (in_file_name, out_file_name))
    image = Image.open(in_file_name)
    print('Image shape: (height=%d, width=%d)' % (image.height, image.width))
    size_raw = raw_size(image.height, image.width)
    print('RAW image size: %d bytes' % size_raw)
    counts = count_symbols(image)
    print('Counts: %s' % counts)
    tree = build_tree(counts)
    print('Tree: %s' % str(tree))
    trimmed_tree = trim_tree(tree)
    print('Trimmed tree: %s' % str(trimmed_tree))
    codes = assign_codes(trimmed_tree)
    print('Codes: %s' % codes)

    size_estimate = compressed_size(counts, codes)
    print('Estimated size: %d bytes' % size_estimate)

    print('Writing...')
    stream = OutputBitStream(out_file_name)
    print('* Header offset: %d' % stream.bytes_written)
    encode_header(image, stream)
    stream.flush() # Ensure next chunk is byte-aligned
    print('* Tree offset: %d' % stream.bytes_written)
    encode_tree(trimmed_tree, stream)
    stream.flush() # Ensure next chunk is byte-aligned
    print('* Pixel offset: %d' % stream.bytes_written)
    encode_pixels(image, codes, stream)
    stream.close()

    size_real = stream.bytes_written
    print('Wrote %d bytes.' % size_real)

    print('Estimate is %scorrect.' % ('' if size_estimate == size_real else 'in'))
    print('Compression ratio: %0.2f' % (float(size_raw) / size_real))

Decompression

Implementation:

from PIL import Image

def decode_header(bitstream):
    height = from_binary_list(bitstream.read_bits(16))
    width = from_binary_list(bitstream.read_bits(16))
    return (height, width)

# https://stackoverflow.com/a/759766/3962537
def decode_tree(bitstream):
    flag = bitstream.read_bits(1)[0]
    if flag == 1: # Leaf, read and return symbol
        return from_binary_list(bitstream.read_bits(8))
    left = decode_tree(bitstream)
    right = decode_tree(bitstream)
    return (left, right)

def decode_value(tree, bitstream):
    bit = bitstream.read_bits(1)[0]
    node = tree[bit]
    if type(node) == tuple:
        return decode_value(node, bitstream)
    return node

def decode_pixels(height, width, tree, bitstream):
    pixels = bytearray()
    for i in range(height * width * 3):
        pixels.append(decode_value(tree, bitstream))
    return Image.frombytes('RGB', (width, height), bytes(pixels))

def decompress_image(in_file_name, out_file_name):
    print('Decompressing "%s" -> "%s"' % (in_file_name, out_file_name))

    print('Reading...')
    stream = InputBitStream(in_file_name)
    print('* Header offset: %d' % stream.bytes_read)
    height, width = decode_header(stream)
    stream.flush() # Ensure next chunk is byte-aligned
    print('* Tree offset: %d' % stream.bytes_read)    
    trimmed_tree = decode_tree(stream)
    stream.flush() # Ensure next chunk is byte-aligned
    print('* Pixel offset: %d' % stream.bytes_read)
    image = decode_pixels(height, width, trimmed_tree, stream)
    stream.close()
    print('Read %d bytes.' % stream.bytes_read)

    print('Image size: (height=%d, width=%d)' % (height, width))
    print('Trimmed tree: %s' % str(trimmed_tree))
    image.save(out_file_name)

Test Run

from PIL import ImageChops

def raw_size(width, height):
    header_size = 2 * 16 # height and width as 16 bit values
    pixels_size = 3 * 8 * width * height # 3 channels, 8 bits per channel
    return (header_size + pixels_size) / 8

def images_equal(file_name_a, file_name_b):
    image_a = Image.open(file_name_a)
    image_b = Image.open(file_name_b)

    diff = ImageChops.difference(image_a, image_b)

    return diff.getbbox() is None

if __name__ == '__main__':
    start = time.time()

    compress_image('flag.png', 'answer.txt')

    print('-' * 40)

    decompress_image('answer.txt', 'flag_out.png')

    stop = time.time()
    times = (stop - start) * 1000

    print('-' * 40)

    print('Run time takes %d miliseconds' % times)
    print('Images equal = %s' % images_equal('flag.png', 'flag_out.png'))

I ran the script with the sample image you provided.

Console Output:

Compressing "flag.png" -> "answer.txt"
Image shape: (height=18, width=23)
RAW image size: 1246 bytes
Counts: [(24, 90), (131, 90), (215, 90), (59, 324), (60, 324), (110, 324)]
Tree: (1242, ((594, ((270, ((90, 215), (180, ((90, 24), (90, 131))))), (324, 59))), (648, ((324, 60), (324, 110)))))
Trimmed tree: (((215, (24, 131)), 59), (60, 110))
Codes: {215: [0, 0, 0], 24: [0, 0, 1, 0], 131: [0, 0, 1, 1], 59: [0, 1], 60: [1, 0], 110: [1, 1]}
Estimated size: 379 bytes
Writing...
* Header offset: 0
* Tree offset: 4
* Pixel offset: 12
Wrote 379 bytes.
Estimate is correct.
Compression ratio: 3.29
----------------------------------------
Decompressing "answer.txt" -> "flag_out.png"
Reading...
* Header offset: 0
* Tree offset: 4
* Pixel offset: 12
Read 379 bytes.
Image size: (height=18, width=23)
Trimmed tree: (((215, (24, 131)), 59), (60, 110))
----------------------------------------
Run time takes 32 miliseconds
Images equal = True

Potential Improvements

Huffman table per colour channel
Palette image support
Transformation filter (delta coding per channel, or more sophisticated predictor)
Model to handle repetitions (RLE, LZ...)
Canonical Huffman tables

I wish CS-related textbooks had explained algorithms like this answer. thank you Dan. — HadiRj, Jun 17 '21 at 12:03
Thank you for the answer! I get an error though, stating `TypeError: '<' not supported between instances of 'tuple' and 'int'` in the \ `build_tree` function, from the `node.sort` line, any idea on how to solve this ? — Achille G, May 24 '22 at 13:32
(With any bigger .png the error appears, though it works just fine with the image provided) — Achille G, May 24 '22 at 14:36

Image compression in python

1 Answers1

Code Cleanup

Calculating Symbol Counts

Building the Tree

Trimming the Tree

Assigning Codes

Encoding

Bitstreams

Bit I/O Utilities

Output Bitstream

Input Bitstream

Compressed Format

Compression

Decompression

Test Run

Potential Improvements