Vectorization is the process of converting each explicit for
loop into a 1-dimensional array operation.
In Python, this will involve reimagining your data in terms of slices.
In the code below, I've provided a working vectorization of the kernel loop. This shows how to approach vectorization, but since it is only optimizing the 3x3 array, it doesn't give you the biggest available gains.
If you want to see a larger improvement, you'll vectorize the image array, which I've templated for you as well -- but left some as an exercise.
import numpy as np
from PIL import Image
## no vectorization
def applyFilterMethod1(image: np.array, weightArray: np.array) -> np.array:
rows, cols = image.shape ; height, width = weightArray.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for rrow in range(rows - height + 1):
for ccolumn in range(cols - width + 1):
for hheight in range(height):
for wwidth in range(width):
imgval = image[rrow + hheight, ccolumn + wwidth]
filterval = weightArray[hheight, wwidth]
output[rrow, ccolumn] += imgval * filterval
return output
## vectorize the kernel loop (~3x improvement)
def applyFilterMethod2(image: np.array, weightArray: np.array) -> np.array:
rows, cols = image.shape ; height, width = weightArray.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for rrow in range(rows - height + 1):
for ccolumn in range(cols - width + 1):
imgval = image[rrow:rrow + height, ccolumn:ccolumn + width]
filterval = weightArray[:, :]
output[rrow, ccolumn] = sum(sum(imgval * filterval))
return output
## vectorize the image loop (~50x improvement)
def applyFilterMethod3(image: np.array, weightArray: np.array) -> np.array:
rows, cols = image.shape ; height, width = weightArray.shape
output = np.zeros((rows - height + 1, cols - width + 1))
for hheight in range(height):
for wwidth in range(width):
imgval = 0 ## TODO -- construct a compatible slice
filterval = weightArray[hheight, wwidth]
output[:, :] += imgval * filterval
return output
src = Image.open("input.png")
sb = np.asarray(src)
cb = np.array([[1,2,1],[2,4,2],[1,2,1]])
cb = cb/sum(sum(cb)) ## normalize
db = applyFilterMethod2(sb, cb)
dst = Image.fromarray(db)
dst.convert("L").save("output.png")
#src.show() ; dst.show()
Note: You could probably remove all four for
loops, with some additional complexity. However, because this would only eliminate the overhead of 9 iterations (in this example), I don't estimate that it would yield any additional performance gains over applyFilterMethod3
. Furthermore, although I haven't attempted it, the way I imagine it would be done might add more overhead than it would remove.
FYI: This is a standard image convolution (supporting only grayscale as implemented). I always like to point out that, in order to be mathematically correct, this would need to compensate for the gamma compression that is implicit in nearly every default image encoding -- but this little detail is often ignored.
Discussion
This type of vectorization is is often necessary in Python, specifically, because the standard Python interpreter is extremely inefficient at processing large for
loops. Explicitly iterating over each pixel of an image, therefore, wastes a lot time. Ultimately, though, the vectorized implementation does not change the amount of real work performed, so we're only talking about eliminating an overhead aspect of the algorithm.
However, vectorization has a side-benefit: Parallelization. Lumping a large amount of data processing onto a single operator gives the language/library more flexibility in how to optimize the execution. This might include executing your embarrassingly parallel operation on a GPU -- if you have the right tools, for example the Tensorflow image module.
Python's seamless support for array programming is one reason that it has become highly popular for use in machine learning, which can be extremely compute intensive.
Solution
Here's the solution to imgval
assignment, which was left as an exercise above.
imgval = image[hheight:hheight+rows - height+1, wwidth:wwidth+cols - width +1]