2

I wrote a python script with combines images in unique ways for an OpenGL shader. The problem is that I have a large number of very large maps and it takes a long time to process. Is there a way to write this in a quicker fashion?

    import numpy as np

    map_data = {}
    image_data = {}
    for map_postfix in names:
    file_name = inputRoot + '-' + map_postfix + resolution + '.png'
    print 'Loading ' + file_name
    image_data[map_postfix] = Image.open(file_name, 'r')
    map_data[map_postfix] = image_data[map_postfix].load()


    color = mapData['ColorOnly']
    ambient = mapData['AmbientLight']
    shine = mapData['Shininess']

    width = imageData['ColorOnly'].size[0]
    height = imageData['ColorOnly'].size[1]

    arr = np.zeros((height, width, 4), dtype=int)

    for i in range(width):
        for j in range(height):
            ambient_mod = ambient[i,j][0] / 255.0
            arr[j, i, :] = [color[i,j][0] * ambient_mod , color[i,j][1] * ambient_mod , color[i,j][2] * ambient_mod , shine[i,j][0]]

    print 'Converting Color Map to image'
    return Image.fromarray(arr.astype(np.uint8))

This is just a sample of a large number of batch processes so I am more interested in if there is a faster way to iterate and modify an image file. Almost all the time is being spent on the nested loop vs loading and saving.

David
  • 1,648
  • 1
  • 16
  • 31
  • 3
    Are you familiar with the idea that `numpy` works much faster when you try to operate on the whole array at once (or at least whole vectors at once), rather than looping through individual elements? This question seems like a pretty typical example of that problem. – Marius Sep 30 '14 at 01:40
  • I forgot to show my import statement at the top. Is this the correct usage. Is there more I should be using Numpy for? – David Sep 30 '14 at 04:44
  • 3
    you should be trying to perform your multiplications, divisions etc. on the whole `color` and `shine` arrays, not individual elements of them, and likewise creating an `ambient_mod` array with something like `ambient_mod_arr = ambient[:, :, 0] / 255.0`. This approach is quite hard to get your head around initially and too hard for me to explain in a single SO answer, but it's pretty fundamental to using numpy efficiently. – Marius Sep 30 '14 at 06:47
  • Alright, this makes sense. This is how math working inside of OpenGL SL, so I just need to get the syntax down for the vector math in this context – David Sep 30 '14 at 13:47
  • In addition of Marius suggestion, you might try [numba](http://numba.pydata.org/) optimizer (BSD License). Numba allows to select method and JIT compile them. – FabienAndre Sep 30 '14 at 17:31
  • @FabienAndre - Thanks! If I can figure out one last issue in my code, I think I'll be in the clear, but I'll definitely keep this in mind for the future. – David Sep 30 '14 at 18:23
  • @FabienAndre May be interested to review in >>> http://stackoverflow.com/a/26127757/3666197 where surprises about real-performance of **`numba`** JIT / vs. just an improved code-design + a plain `numpy` was tested / measured -- showing that `numba` JIT compilation + code-execution was **3x slower**, than a good plain code-design. **Worth read, test / measure and evaluate effect per your Project's individual needs, rather than to fail on inappropriate generalisation alike One-size-fits-all** – user3666197 Oct 03 '14 at 13:48
  • @user3666197 indeed, compiling takes time and is not always worth doing for *one shot* code. However in the link you provided, **the numba version run more than 100x faster than yours**. Worth considering. – FabienAndre Oct 05 '14 at 12:25

1 Answers1

2

Vectorised-code example -- test effect on yours in timeit or zmq.Stopwatch()

Reported to have 22.14 seconds >> 0.1624 seconds speedup!

While your code seems to loop just over RGBA[x,y], let me show a "vectorised"-syntax of a code, that benefits from numpy matrix-manipulation utilities ( forget the RGB/YUV manipulation ( originally based on OpenCV rather than PIL ), but re-use the vectorised-syntax approach to avoid for-loops and adapt it to work efficiently for your calculus. Wrong order of operations may more than double yours processing time.

And use a test / optimise / re-test loop for speeding up.

For testing, use standard python timeit if [msec] resolution is enough.

Go rather for zmq.StopWatch() if you need going into [usec] resolution.

# Vectorised-code example, to see the syntax & principles
#                          do not mind another order of RGB->BRG layers
#                          it has been OpenCV traditional convention
#                          it has no other meaning in this demo of VECTORISED code

def get_YUV_U_Cb_Rec709_BRG_frame( brgFRAME ):  # For the Rec. 709 primaries used in gamma-corrected sRGB, fast, VECTORISED MUL/ADD CODE
    out =  numpy.zeros(            brgFRAME.shape[0:2] )
    out -= 0.09991 / 255 *         brgFRAME[:,:,1]  # // Red
    out -= 0.33601 / 255 *         brgFRAME[:,:,2]  # // Green
    out += 0.436   / 255 *         brgFRAME[:,:,0]  # // Blue
    return out
# normalise to <0.0 - 1.0> before vectorised MUL/ADD, saves [usec] ...
# on 480x640 [px] faster goes about 2.2 [msec] instead of 5.4 [msec]

In your case, using dtype = numpy.int, guess it shall be faster to MUL first by ambient[:,:,0] and finally DIV to normalisearr[:,:,:3] /= 255

# test if this goes even faster once saving the vectorised overhead on matrix DIV
arr[:,:,0] = color[:,:,0] * ambient[:,:,0] / 255  # MUL remains INT, shall precede DIV
arr[:,:,1] = color[:,:,1] * ambient[:,:,0] / 255  # 
arr[:,:,2] = color[:,:,2] * ambient[:,:,0] / 255  # 
arr[:,:,3] = shine[:,:,0]                         # STO alpha

So how it may look in your algo?

One need not have Peter Jackson's impressive budget and time once planned, spanned and executed immense number-crunching over 3 years in a New Zealand hangar, overcrowded by a herd of SGI workstations, as he was producing "The Lord of The Rings" fully-digital mastering assembly-line, right by the frame-by-frame pixel manipulation, to realise that miliseconds and microseconds and even nanoseconds in the mass-production pipe-line simply do matter.

So, take a deep breath and test and re-test so as to optimise your real-world imagery processing performance to levels that your project needs.

Hope this may help you on this:

# OPTIONAL for performance testing -------------# ||||||||||||||||||||||||||||||||
from zmq import Stopwatch                       # _MICROSECOND_ timer
#                                               # timer-resolution step ~ 21 nsec
#                                               # Yes, NANOSECOND-s
# OPTIONAL for performance testing -------------# ||||||||||||||||||||||||||||||||
arr        = np.zeros( ( height, width, 4 ), dtype = int )
aStopWatch = zmq.Stopwatch()                    # ||||||||||||||||||||||||||||||||
# /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\# <<< your original code segment          
#  aStopWatch.start()                           # |||||||||||||__.start
#  for i in range(     width  ):
#      for j in range( height ):
#          ambient_mod  = ambient[i,j][0] / 255.0
#          arr[j, i, :] = [ color[i,j][0] * ambient_mod, \
#                           color[i,j][1] * ambient_mod, \
#                           color[i,j][2] * ambient_mod, \
#                           shine[i,j][0]                \
#                           ]
#  usec_for = aStopWatch.stop()                 # |||||||||||||__.stop
#  print 'Converting Color Map to image'
#  print '           FOR processing took ', usec_for, ' [usec]'
# /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\# <<< proposed alternative
aStopWatch.start()                              # |||||||||||||__.start
# reduced numpy broadcasting one dimension less # ref. comments below
arr[:,:, 0]  = color[:,:,0] * ambient[:,:,0]    # MUL ambient[0]  * [{R}]
arr[:,:, 1]  = color[:,:,1] * ambient[:,:,0]    # MUL ambient[0]  * [{G}]
arr[:,:, 2]  = color[:,:,2] * ambient[:,:,0]    # MUL ambient[0]  * [{B}]
arr[:,:,:3] /= 255                              # DIV 255 to normalise
arr[:,:, 3]  = shine[:,:,0]                     # STO shine[  0] in [3]
usec_Vector  = aStopWatch.stop()                # |||||||||||||__.stop
print 'Converting Color Map to image'
print '           Vectorised processing took ', usec_Vector, ' [usec]'
return Image.fromarray( arr.astype( np.uint8 ) )
user3666197
  • 1
  • 6
  • 50
  • 92
  • I forgot to show my import statement at the top before. import numpy as np. Is there more I should be using it for? – David Sep 30 '14 at 04:44
  • No problem, David. No other import is needed. Numpy has designed internal abilities to analyse and accelerate the order/scale of the iterative matrix operations and also takes into account it's internal data representation ( FORTRAN-ordering,C-ordering,sparse-maping of the actual data-cells, so rather forget internalities and remain on top of the numpy array abstraction ). Plus, you work with Byte-coded RGBA, so keep the most of operations in numpy.int, which avoids reallocation of dtypes into float or losing precision on rounding. [:,:,0] is enough to tell "all i-s, j-s" in [i,j][0]. Test it. – user3666197 Sep 30 '14 at 11:42
  • @David if the posted recipes did help your imagery processing, be so kind and post your performance outputs and share it with the StackOverflow Community here, ok? Thanks for doing that, David. – user3666197 Sep 30 '14 at 12:29
  • @David if the proposed speed-up is not enough for your overall performance, give me a note that you need to drill down to even a more aggressive approach to avoid fileIO and to increase the digital-factory grade performance levels and perhaps use some steps towards massively distributed processing schema. Ok? – user3666197 Sep 30 '14 at 12:39
  • Thanks for the info. Let me give this a try and I'll post the results. – David Sep 30 '14 at 13:48
  • 1
    My initial test indicate that this will be a Huge help! Unfortunately one of the lines is incorrect and I can't seem to figure out the syntax: arr[:,:,:3] = color[:,:,:3] * ambient[:,:,0] This causes ValueError: operands could not be broadcast together with shapes (1333,2000,3) (1333,2000). It does not seem to realize that it is supposed to be a scalar multiplied into each vector. How do I correct this? – David Sep 30 '14 at 17:58
  • 1
    @David ref. updated syntax to reduce the numpy vectorisation dimensionality by one to just 2D. Looking forward to your performance measurements. – user3666197 Sep 30 '14 at 18:31
  • 1
    Ok, using the code I went from 22.14 seconds to 0.1624 seconds! The bottom code in your post does not run. I used something similar to the code you have on top (didn't see it until I had corrected it). You may want to edit it so that the final answer has the correct code arr[:,:,0] = color[:,:,0] * (ambient[:,:,0] / 255.0) arr[:,:,1] = color[:,:,1] * (ambient[:,:,0] / 255.0) arr[:,:,2] = color[:,:,2] * (ambient[:,:,0] / 255.0) arr[:,:,3] = shine[:,:,0] – David Sep 30 '14 at 18:43