1

if I insert an image of size (205,205) the algorithm compresses it very quickly. But if I insert a larger image, the algorithm takes a very long time to compress it. my intention would be to optimize the code and consequently speed up the compression phase do you have any suggestions?

library

from PIL import Image
import numpy as np 
from cv2 import cv2

**function of compression**

def lz77Compress (image,sw,lab):
    img = cv2.imread(image)
    flat = np.array(img).flatten()
    row = img.shape[0]
    col = img.shape[1]
    ch = img.shape[2]
    tot = row * col * ch
    slidingWindows = sw
    lookAhead = lab 

array of tuple and char

 encodedTuple = np.array([], dtype=np.uint16) 
    encodedChar = np.array([], dtype=np.uint8)
    **# Lunghezza del Search Buffer**

sbSize = slidingWindows - lookAhead
    for it in range(sbSize):
        encodedTuple = np.append(encodedTuple, (0, 0))
        encodedChar = np.append(encodedChar, flat[it])
    **# pointer in the  Search Buffer**
 sbPointer = 0
while sbPointer < tot :
        max_match = 0
        max_match_go_back = 0        
        selChar = sbPointer + sbSize
     # corrispondenza del carattere in Sb da lookAd
        encodeCharacters = flat[selChar] 
    **#sequenza vuota[]**
       seqY = np.array([],dtype=np.int16)
        for i in range(sbPointer,sbPointer + sbSize):
            if(flat[i] == encodeCharacters):
                seqY = np.append(seqY,i)
        
    **check of corrispondence and insert in encodedtuple and encodedChar**
        if(seqY.size == 0 ):
            encodedTuple = np.append(encodedTuple,(0,0))
            encodedChar = np.append (encodedChar,encodeCharacters)

else:
            for j in seqY:
            **size of corrisponddence**
                matchLenght= 0
                returnBack= selChar - j
                it = 0 
                while selChar + it < tot :
                    if flat[it + j] == flat[selChar + it]:
                        matchLenght +=1
                        it +=1
                  **# if there is not corrispondence*
                    else: 
                        break
                if matchLenght>max_match:
                   max_match = matchLenght
                   returnBack= max_match_go_back

           
            encodedTuple = np.append(encodedTuple,(max_match_go_back,max_match))
            encodedChar = np.append(encodedChar,flat[selChar + max_match - 1])
            
        sbPointer+= 1 +max_match
    **save encoedTupe and encodedChar and image size for the compression**
    np.save("encodedTuple", encodedTuple) 
    np.save("encodedChar", encodedChar)
    print("File compresso in : Copressed.txt")
    output = open("Compressed.txt","w+")
    output.write(str(c))
    
giangri_93
  • 29
  • 3
  • To begin, fix your code example to by syntactically valid (i.e. no mis-indented line, no lines like `**#sequenza vuota[]**`, and so on). | In order to identify the performance bottlenecks, you should profile your code. There is plenty of information around on how to do that in Python. – Dan Mašek Apr 04 '21 at 12:11
  • That said, I see a lot of calls to [`np.append`](https://numpy.org/doc/stable/reference/generated/numpy.append.html). The documentation of this function says: "Note that append does not occur in-place: a new array is allocated and filled." -- that means that on every call, a new array (enlarged by one element) is allocated, all the data from the original is copied over, the new element stored to the last position, and the old array is discarded. The longer the array, the more time each copy takes (overall it's O(n^2)). – Dan Mašek Apr 04 '21 at 12:21
  • What could I use instead of np.append? – giangri_93 Apr 06 '21 at 09:51
  • Simplest thing would be to use a regular Python list, and append to that. Only at the end, convert the list to a numpy array. Based on quick test, that's about 50x faster on my machine. – Dan Mašek Apr 06 '21 at 11:18
  • https://github.com/Anto-9393/project-multimedia-lz77 I made some changes, the algorithm has speeded up but now the final image is only half visible. Any suggestions? – giangri_93 Apr 06 '21 at 17:36
  • The problem is that you changed `np.append(encodedTuple, (0, 0))` to `encodedTuple.append(0)` -- i.e. where you originally appended two zeros to the array, you now append just one. However the decoder still expects to read a pair of numbers. Normally, this sort of mistake would result in garbage being produced, however in your case there aren't any matches, so `encodedTuple` is still just an array of zeros, just half as many as there should be. And since the decoding loop stops then it reaches the end of this array, you get only half the pixel values decoded. – Dan Mašek Apr 08 '21 at 12:21
  • That however uncovers much more serious problem in your code -- the encoder never finds a match, even when processing an all-black image (i.e. this branch never happens: https://github.com/Anto-9393/project-multimedia-lz77/blob/main/image.py#L44-L62). I haven't yet looked deeper into that part tho. – Dan Mašek Apr 08 '21 at 12:25
  • before processing a black and white image I would like the algorithm to work with a color image. But if I reuse np.append the algorithm will take a long time to process an image – giangri_93 Apr 08 '21 at 17:08
  • Here, I've reworked the whole thing: https://pastebin.com/sR5TniNj | And here is your current code, with just minimum fixes to stop it from decompressing just half the data: https://pastebin.com/djCAAQ0g (you might have to fix some imports). | Problem is that the code to find the matches is even a bigger bottleneck... at least the naive implementation of it I've shown. – Dan Mašek Apr 08 '21 at 17:17
  • I don't know how to thank you =) I wanted to ask you the last thing: if I wanted to process a file.txt how could I do without changing algorithm? – giangri_93 Apr 08 '21 at 18:21
  • Just treat the file as an array of bytes -- `input_buffer = np.fromfile('some_file.txt', dtype=np.uint8)`. Then you can process that the same way you processed the array of bytes representing pixels of an image (`flat` in your code). – Dan Mašek Apr 08 '21 at 18:44
  • Just in case you're curious, here's an implementation of [Knuth-Morris-Pratt](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm) pattern matching function: https://pastebin.com/WRPyyivR It's a small improvement (takes about 1/3 of the time as the naive approach). There are more approaches (see http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.8587&rep=rep1&type=pdf) but generally, implementing this in Python you're already fighting an uphill battle. – Dan Mašek Apr 08 '21 at 19:17
  • perfect, but if in my code that you have corrected, if I wanted to give in addition to a color image also a black and white image and a txt file. should i create an if before the lz77Compress function? – giangri_93 Apr 08 '21 at 20:02
  • Let's continue this in a [chat](https://chat.stackoverflow.com/rooms/230916/lzz-77-compression-image-rgb), that's going to be better than an endless stream of comments... – Dan Mašek Apr 08 '21 at 21:48
  • hello, sorry but I can't answer you in the chat because I'm new to the stack and I don't have 20 points of reputation – giangri_93 Apr 09 '21 at 07:03
  • Ah, I was wondering why the system didn't offer the chat earlier. Anyway, now you should have enough rep for it. – Dan Mašek Apr 09 '21 at 09:09
  • Thanks for the reputation points, I wrote to you in chat – giangri_93 Apr 09 '21 at 11:38
  • @DanMašek : i am having trouble with this lzw compression algorithm for images. Could you tell me why I get this error: https://stackoverflow.com/questions/67043929/valueerror-cannot-reshape-array-of-size-1-into-shape-500-700-3 – giangri_93 Apr 11 '21 at 10:38

0 Answers0