You can use a dynamic programming technique to make it many times faster. Preprocess the matrix by starting the top left corner and moving left to right, then top to bottom, setting each cell to be the sum of its value and the one above (if it exists) and the one to the left (if it exists). When you get to the end, the bottom right value should be the total sum of the whole matrix.
for x in xrange(4000):
for y in xrange(4000):
c[x,y] = a[x,y]
if x > 0:
c[x,y] += c[x-1,y]
if y > 0:
c[x,y] += c[x,y-1]
and now you can get the sum of any rectangular region by subtract the top left corner from the top right: eg. in this case, the sum of the 11x11 region will be
c[x+5,y+5]-c[x-5,y-5]
Then you can just divide by the size of the window to get the local average:
b[x,y] = (c[x+5+,y+5]-c[x-5,y-5])/121
Now instead of iterating over 121 spots for each element in the matrix, you only have to make 2 passes over the matrix with no iterations for each element.