7

I'm working on teaching myself the basics of computerized image processing, and I am teaching myself Python at the same time.

Given an image x of dimensions 2048x1354 with 3 channels, efficiently calculate the histogram of the pixel intensities.

import numpy as np, cv2 as cv

img = cv.imread("image.jpg")
bins = np.zeros(256, np.int32)

for i in range(0, img.shape[0]):
    for j in range(0, img.shape[1]):

        intensity = 0
        for k in range(0, len(img[i][j])):
            intensity += img[i][j][k]

        bins[intensity/3] += 1

print bins

My issue is that this code runs pretty slowly, as in ~30 seconds. How can I speed this up and be more Pythonic?

Drewness
  • 5,004
  • 4
  • 32
  • 50
Jason
  • 11,263
  • 21
  • 87
  • 181
  • 1
    http://stackoverflow.com/a/14728935/995394 Maybe this is helpful. – iMom0 Mar 03 '14 at 22:50
  • 2
    With 3 nested `for` loops, Your algorithm executes in O(n^3) time, which is very sluggish. – geoff Mar 03 '14 at 22:56
  • This is not exactly related to your original question but consider using a better algorithm for generating the histogram. Since you're probably interested in the percieved colors, you could try using a luminance calculation: http://stackoverflow.com/questions/596216/formula-to-determine-brightness-of-rgb-color – akirilov Mar 03 '14 at 22:59
  • 1
    @geoff the third loop only runs a constant number of times, probably 3. It's not proportional to the size of the image. – Mark Ransom Mar 03 '14 at 23:07
  • One small thing would be to replace the inner loop: `for k in img[i][j]: intensity += k`. Looping over `range(len(...))` is never a good sign. Even better would be to use `sum(img[i][j])` and eliminate the loop entirely. – Mark Ransom Mar 03 '14 at 23:21

5 Answers5

16

You can use newer OpenCV python interface which natively uses numpy arrays and plot the histogram of the pixel intensities using matplotlib hist. It takes less than second on my computer.

import matplotlib.pyplot as plt
import cv2

im = cv2.imread('image.jpg')
# calculate mean value from RGB channels and flatten to 1D array
vals = im.mean(axis=2).flatten()
# plot histogram with 255 bins
b, bins, patches = plt.hist(vals, 255)
plt.xlim([0,255])
plt.show()

enter image description here

UPDATE: Above specified number of bins not always provide desired result as min and max are calculated from actual values. Moreover, counts for values 254 and 255 are summed in last bin. Here is updated code which always plot histogram correctly with bars centered on values 0..255

import numpy as np
import matplotlib.pyplot as plt
import cv2

# read image
im = cv2.imread('image.jpg')
# calculate mean value from RGB channels and flatten to 1D array
vals = im.mean(axis=2).flatten()
# calculate histogram
counts, bins = np.histogram(vals, range(257))
# plot histogram centered on values 0..255
plt.bar(bins[:-1] - 0.5, counts, width=1, edgecolor='none')
plt.xlim([-0.5, 255.5])
plt.show()

enter image description here

Ondro
  • 997
  • 5
  • 8
  • could you please expalin the motivation behind `plt.bar(bins[:-1] - 0.5, counts, width=1, edgecolor='none')` – Beginner Sep 21 '19 at 20:55
  • good point, `plt.bar(bins[:-1], counts, width=1, edgecolor='none')` should be used to plot histogram centered on values 0..255 – Ondro Sep 26 '19 at 21:36
3

If you just want to count the number of occurences of each value in an array, numpy can do that for you using numpy.bincount. In your case:

arr  = numpy.asarray(img)
flat = arr.reshape(numpy.prod(arr.shape[:2]),-1)
bins = numpy.bincount(np.sum(flat,1)/flat.shape[1],minsize=256)

I'm using numpy.asarray here to make sure that img is a numpy array, so I can flatten it to the one-dimensional array bincount needs. If img is already an array, you can skip that step. The counting itself will be very fast. Most of the time here will probably be spent in converting the cv matrix to an array.

Edit: According to this answer, you may need to use numpy.asarray(img[:,:]) (or possibly img[:,:,:]) in order to successfully convert the image to an array. On the other hand, according to this, what you get out from newer versions of openCV is already a numpy array. So in that case you can skip the asarray completely.

amaurea
  • 4,950
  • 26
  • 35
  • This is the best answer. Using CV2 is like using a cannon to kill a mosquito. No need to use openCV for everything when there is pure numpy or numpy based libraries like scikit-image – lesolorzanov Mar 11 '19 at 13:38
2

it's impossible to do this(i.e without removing the for loop) in pure python. Python's for loop construct has too many things going on to be fast. If you really want to keep the for loop, the only solution is numba or cython but these have their own set of issues. Normally, such loops are written in c/c++(most straightforward in my opinion) and then called from python, it's main role being that of a scripting language.

Having said that, opencv+numpy provides enough useful routines so that in 90% of cases, it's possible to simply use built in functions without having to resort to writing your own pixel level code.

Here's a solution in numba without changing your looping code. on my computer it's about 150 times faster than pure python.

import numpy as np, cv2 as cv

from time import time
from numba import jit,int_,uint8 

@jit(argtypes=(uint8[:,:,:],int_[:]),
    locals=dict(intensity=int_),
    nopython=True
    )
def numba(img,bins):
    for i in range(0, img.shape[0]):
        for j in range(0, img.shape[1]):
            intensity = 0
            for k in range(0, len(img[i][j])):
                intensity += img[i][j][k]
            bins[intensity/3] += 1


def python(img,bins):
    for i in range(0, img.shape[0]):
        for j in range(0, img.shape[1]):
            intensity = 0
            for k in range(0, len(img[i][j])):
                intensity += img[i][j][k]
            bins[intensity/3] += 1

img = cv.imread("image.jpg")
bins = np.zeros(256, np.int32)

t0 = time()
numba(img,bins)
t1 = time()
#print bins
print t1 - t0

bins[...]=0
t0 = time()
python(img,bins)
t1 = time()
#print bins
print t1 - t0    
Zaw Lin
  • 5,629
  • 1
  • 23
  • 41
1

Take a look at MatPlotLib. This should take you through everything you want to do, and without the for loops.

Wesley Bowman
  • 1,366
  • 16
  • 35
  • 1
    I know tools already exist. However, I want to use this as a learning opportunity for both the language and algorithms. – Jason Mar 03 '14 at 23:18
  • A huge part of python is learning what tools are available, and matplotlib is a huge library that I use in almost all of my code. I understand you want to learn the language, but Python's utility is that there are so many tools that allow you to do all kinds of things easily and efficiently. – Wesley Bowman Mar 03 '14 at 23:23
  • Another way to speed it up would be to use numpy, but there again you are using a library to help you. Python isn't the best for 'for' loops. You can vectorize this code with Numpy, or use Matplotlib to do it in an even more simple manner. – Wesley Bowman Mar 03 '14 at 23:24
  • 2
    The main thing that makes python great isn't the language itself (though that's nice too (if slow)). It's its huge set of standard libraries, and if you don't use them, you're crippling python. – amaurea Mar 03 '14 at 23:53
  • 1
    agree. this is very fast. Interestingly the way `matplotlib` and `cv2` import images, leads to different values of the img variable. if you import using `cv2` the values of the pixels will be between [0, 255]. If you import the image using `matplotlib`, the values will be between [0,1] – alpha_989 Feb 02 '18 at 17:56
0

From OpenCV docs:

One-channel histogram (image converted to grayscale):

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('home.jpg',0)
plt.hist(img.ravel(),256,[0,256]); plt.show()

RGB histogram (each channel separately)

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('home.jpg')
color = ('b','g','r')
for i,col in enumerate(color):
    histr = cv.calcHist([img],[i],None,[256],[0,256])
    plt.plot(histr,color = col)
    plt.xlim([0,256])
plt.show()
apatsekin
  • 1,524
  • 10
  • 12