it's impossible to do this(i.e without removing the for loop) in pure python. Python's for loop construct has too many things going on to be fast. If you really want to keep the for loop, the only solution is numba or cython but these have their own set of issues. Normally, such loops are written in c/c++(most straightforward in my opinion) and then called from python, it's main role being that of a scripting language.
Having said that, opencv+numpy provides enough useful routines so that in 90% of cases, it's possible to simply use built in functions without having to resort to writing your own pixel level code.
Here's a solution in numba without changing your looping code. on my computer it's about 150 times faster than pure python.
import numpy as np, cv2 as cv
from time import time
from numba import jit,int_,uint8
@jit(argtypes=(uint8[:,:,:],int_[:]),
locals=dict(intensity=int_),
nopython=True
)
def numba(img,bins):
for i in range(0, img.shape[0]):
for j in range(0, img.shape[1]):
intensity = 0
for k in range(0, len(img[i][j])):
intensity += img[i][j][k]
bins[intensity/3] += 1
def python(img,bins):
for i in range(0, img.shape[0]):
for j in range(0, img.shape[1]):
intensity = 0
for k in range(0, len(img[i][j])):
intensity += img[i][j][k]
bins[intensity/3] += 1
img = cv.imread("image.jpg")
bins = np.zeros(256, np.int32)
t0 = time()
numba(img,bins)
t1 = time()
#print bins
print t1 - t0
bins[...]=0
t0 = time()
python(img,bins)
t1 = time()
#print bins
print t1 - t0