The python code to stack images runs extremely slow, looking for suggestions to speed it up

Question

I have written some code to read the RGB values for each pixel of ~150 images (1000px by 720px, cropped and sized).

import os
from PIL import Image
print("STACKING IMAGES...")
os.chdir('cropped')
images=os.listdir() #list all images present in directory
print("GETTING IMAGES...")
channelR=[]
channelG=[]
channelB=[]
print("GETTING PIXEL INFORMATION...")  #runs reasonably fast
for image in images:  #loop through each image to extract RGB channels as separate lists
    with Image.open(image) as img:
        if image==images[0]:
            imgSize=img.size
        channelR.append(list(img.getdata(0)))
        channelG.append(list(img.getdata(1)))
        channelB.append(list(img.getdata(2)))
print("PIXEL INFORMATIION COLLECTED.")
print("AVERAGING IN CHANNEL RED.") #average for each pixel in each channel
avgR=[round(sum(x)/len(channelR)) for x in zip(*channelR)] #unzip the each pixel from all ~250 images, average it, store in tuple, starts to slow
print("AVERAGING IN CHANNEL GREEN.")
avgG=[round(sum(x)/len(channelG)) for x in zip(*channelG)] #slower
print("AVERAGING IN CHANNEL BLUE.")
avgB=[round(sum(x)/len(channelB)) for x in zip(*channelB)] #progressively slower
print("MERGING DATA ACROSS THREE CHANNELS.")
mergedData=[(x) for x in zip(avgR, avgG, avgB)]  #merge averaged colour channels pixel by pixel, doesn't seem to end, takes eternity
print("GENERATING IMAGE.")
stacked=Image.new('RGB', (imgSize)) #create image
stacked.putdata(mergedData) #generate image
stacked.show()
os.chdir('..')
stacked.save('stacked.tif', 'TIFF') #save file
print("FINISHED STACKING !")

Running it on my modestly equipped computer (Core2Duo, 4GB RAM, Linux Mint OS) took close to an hour for the averaging across the three channels to complete and a further one hour to merge the individual averaged pixels (did not complete, and I aborted the process). I have read that list comprehensions are slow and zip() function takes up too much memory, but tinkering with those resulted in further bugs. I have even read that partitioning the program into functions might speed it up.

For comparable performances, I would kindly request the person answering the question to run the code on the images from https://github.com/rlvaugh/Impractical_Python_Projects/tree/master/Chapter_15/video_frames.

Any help on speeding-up the program would be gratefully accepted. Does it hold any chance of improving its speed drastically on shifting to more powerful systems?

Thank you in advance for any help.

You might save some time if you did not loop through every pixel in every image twice. Instead of just appending the complete channels to a large list to analyze later, you could maintain one list of values for each channel and just add the values of all pixels, divided by the number of images, when you read the channel. So, you would create three lists of mean values in one go. — Martin Wettstein, Aug 13 '20 at 15:10
List comprehensions are usually slower than building the list without a comprehension. Expanding those might save you a bit of time if those comprehensions are very slow — Jacob Steinebronn, Aug 13 '20 at 16:16
@JacobSteinebronn, [this](https://stackoverflow.com/a/60254921/843953) seems to indicate otherwise, as do a bunch of other links I've seen. Especially when appending to the list. — Pranav Hosangadi, Aug 13 '20 at 16:35
that's neat! Personally, i've experimented and found that list comps are slower, maybe it depends on the application? In any case, it wasn't *that* much slower, just like 20% or so — Jacob Steinebronn, Aug 13 '20 at 16:44
Try `tifffile.imwrite('stacked.tif', numpy.stack([imagecodecs.imread(name) for name in glob.glob('*.jpg')]).mean(axis=0).round().astype('uint8'))` using [numpy](https://pypi.org/project/numpy/), [tifffile](https://pypi.org/project/tifffile/), and [imagecodecs](https://pypi.org/project/imagecodecs/) libraries.Takes about 2 seconds for the example dataset. — cgohlke, Aug 14 '20 at 14:05

Pranav Hosangadi · Answer 1 · 2020-08-13T16:30:24.513

-2

Appending to lists is slow. As is having multiple list comprehensions for something you could do in a single loop. You could also use numpy arrays to speed it up using SIMD operations instead of iterating over list.

Here's some sample code for a few images. You can extend it as per your requirements.

import os
import numpy as np
import PIL

os.chdir('cropped')

imgfiles = ['MVI_6450 001.jpg', 'MVI_6450 002.jpg', 'MVI_6450 003.jpg', 'MVI_6450 004.jpg']

allimgs = None

for imgnum, imgfile in enumerate(imgfiles):
    img = PIL.Image.open(imgfile)
    imgdata = np.array(img.getdata()) # Nx3 array. columns: R, G, B channels
    
    if allimgs is None:
        allshape = list(imgdata.shape) # Size of one image
        allshape.append(len(imgfiles)) # Append number of images
        # allshape is now [num_pixels, num_channels, num_images]
        # so making an array of this shape will allow us to store all images
        # Axis 0: pixels. Axis 1: channels. Axis 2: images
        allimgs = np.zeros(allshape) 
    
    allimgs[:, :, imgnum] = imgdata # Set the imgnum'th image data
    

# Get the mean along the last axis 
#     average same pixel across all images for each channel
imgavg = np.mean(allimgs, axis=-1) 

# normalize so that max value is 255
# Also convert to uint8
imgavg = np.uint8(imgavg / np.max(imgavg) * 255)

imgavg_tuple = tuple(map(tuple, imgavg))

stacked = PIL.Image.new("RGB", img.size)
stacked.putdata(imgavg_tuple)
stacked.show()

os.chdir('..')

Note: We create a numpy array to hold all images at the start instead of appending as we load more images because it's a bad, bad idea to append to numpy arrays as Jacob mentions in a comment below. This is because numpy array append actually creates a new array and then copies the contents of both arrays, so it's an O(n^2) operation.

edited Aug 13 '20 at 16:30

answered Aug 13 '20 at 15:28

Pranav Hosangadi

23,755
7
44
70

2

very important note: APPENDING TO NUMPY ARRAYS IS HORRIBLE! You'd be much better off appending to a regular list. This is because numpy.append is not in-place, so to build a list of n items by appending, you'll have O(n^2) which is clearly very bad. – Jacob Steinebronn Aug 13 '20 at 16:15
But I _don't_ append to numpy arrays anywhere in my code? – Pranav Hosangadi Aug 13 '20 at 16:16
3

I didn't say you did, was just pointing out to the poster( since they seem relatively new to this kind of thing) that building numpy arrays like that can be very bad – Jacob Steinebronn Aug 13 '20 at 16:17
That's a good point though. Worth making a note in my answer. I'll update it. – Pranav Hosangadi Aug 13 '20 at 16:18
1

@PranavHosangadi, Traceback (most recent call last): File "stackImg.py", line 26, in stacked.putdata(imgAvgTuple) File "/usr/lib/python3/dist-packages/PIL/Image.py", line 1626, in putdata self.im.putdata(data, scale, offset) TypeError: integer argument expected, got float – Ranjit Pal Aug 14 '20 at 07:23
1

putdata() requires pixel values in int, but mean() returns in floats – Ranjit Pal Aug 14 '20 at 07:26
@RanjitPal, `imgavg = np.uint8(imgavg / np.max(imgavg) * 255)` converts it to uint – Pranav Hosangadi Aug 14 '20 at 14:40

The python code to stack images runs extremely slow, looking for suggestions to speed it up

1 Answers1