Improve CPU time of conditional statement

Question

I have written an if-elif statement, which I believe not be very efficient:

first_number = 1000
second_number = 700
switch = {
    'upperRight': False,
    'upperLeft': False,
    'lowerRight': False,
    'lowerLeft': False,
    'middleLeft': False,
    'middleRight': False,
    'upperMiddle': False,
    'lowerMiddle': False,
    'middle': False
}

for i in range(first_number):
    for j in range(second_number):
        if pixel_is_black:
            if i <= int(first_number/3) and j <= int(second_number/3):
                switch['upperLeft'] = True
            elif i <= int(first_number/3) and int(second_number/3) < j <= int(2*second_number/3):
                switch['middleLeft'] = True
            elif i <= int(first_number/3) and j > int(2*second_number/3):
                switch['lowerLeft'] = True
            elif int(first_number / 3) <= i < int(2 * first_number / 3) and j < int(second_number / 3):
                switch['upperMiddle'] = True
            elif int(first_number / 3) <= i < int(2 * first_number / 3) and int(second_number / 3) < j <= int(2 * second_number / 3):
                switch['middle'] = True
            elif int(first_number / 3) <= i < int(2 * first_number / 3) and j >= int(2 * second_number / 3):
                switch['lowerMiddle'] = True
            elif i >= int(2 * first_number / 3) and j <= int(2 * second_number / 3):
                switch['upperRight'] = True
            elif i >= int(2 * first_number / 3) and int(second_number / 3) < j <= int(2 * second_number / 3):
                switch['middleRight'] = True
            elif i >= int(2 * first_number / 3) and  j >= int(2 * second_number / 3):
                switch['lowerRight'] = True

for i in switch:
    if(switch[i] == True):
        print(i)

As you can see, the statement looks pretty ugly. Since number is big, it takes almost 2 seconds for the execution. In the loop, I am going through the pixels of an image. In the if-elif statement, I divide the image in 9 parts, and print the respective number, if the pixel is black in that area.

Is there any way I could lower the CPU time?

I tried this answer, but my statement conditions are different.

Thank you.

What is `some_condition` - what is `something`... Why do you need both `i` and `j` and make 1,000,000 iterations... you can already know/compute `j`'s status based on just `i` — Jon Clements, Jun 28 '17 at 09:24
@JonClements I will edit some lines to the if statement to make the code more clear — Snow, Jun 28 '17 at 09:30
It's likely you don't have to go through those for loops at all. Also numpy would be brilliantly fast for a task like this. However, it's impossible to know how to help you actually design this system without knowing what you're actually comparing (or an actual similar example). I understand you were going for a minimal example, but the general request is for a [minimal, *complete and verifiable* example](https://stackoverflow.com/help/mcve) so that others can reproduce the behavior and get you your desired result. Help others help you. FWIW if/else is quite speedy, it's the loops killing you. — alkasm, Jun 28 '17 at 09:48
Are all of your `something` values related to which third of the iteration you're in for your two variables? If so, it should be possible to compute the key you want into your `switch` dict directly (without any `if`/`elif`/`else` nonsense). But you've abstracted away so much of your code that it's not clear if that's what you're trying to do or not. — Blckknght, Jun 28 '17 at 10:10
@Blckknght I edited again. In the loop, I am going through the pixels of an image. In the if-elif statement, I divide the image in 9 parts, and print the respective number, if the pixel is black in that area. — Snow, Jun 28 '17 at 11:31
@JohnSmith what data type are you using for the image? Is a numpy solution okay with you or are you avoiding it? This is super quick in numpy. — alkasm, Jun 28 '17 at 20:03
@AlexanderReynolds I am using numpy to get the width and height from the image, so I have no issue with that. — Snow, Jun 28 '17 at 20:04
@AlexanderReynolds I calculate the pace = (first_number * j) + i, and if data[pace] != (0, 0, 0, 255): continue loop. — Snow, Jun 28 '17 at 20:13

alkasm · Accepted Answer · 2017-06-30T16:51:58.220

2

Since you're working with images and using numpy, I think the easiest thing to do would be split up the image into blocks and see if any pixels inside those blocks are black. For example, suppose I have an edge image where the middle doesn't have any black pixels, like this:

We can use a list comprehension to turn the image into blocks:

h, w = img.shape[:2]
bh, bw = h/3, w/3

bw_ind = [0, int(bw), 2*int(bw), w]
bh_ind = [0, int(bh), 2*int(bh), h]
blocks = [img[bh_ind[i]:bh_ind[i+1], bw_ind[j]:bw_ind[j+1]] for i in range(3) for j in range(3)]

Now just to make things a little simpler, we can make a list of the keys in your list dictionary in the same order that the blocks are in; that way, blocks[0] would correspond to switch_list[0] which is "upperLeft".

switch_list = [
    'upperLeft', 'upperMiddle', 'upperRight',
    'middleLeft', 'middle', 'middleRight',
    'lowerLeft', 'lowerMiddle', 'lowerRight'
    ]

Then the last thing to do is find the black pixels in each block. So we want to go through the 9 blocks (using a loop) and then compare the values inside the block to whatever color we're interested in. If you had a 3-channel 8-bit image, then black is usually represented with a 0 in each channel. So for a single pixel, if it was black, then we could compare it to black with pixel == [0,0,0]. But this returns a boolean for each value:

>>> pixel == [0,0,0]
[True, True, True]

The pixel is only black when all three of these values match, so we can use .all() on the result, which will only return True if the whole array is True:

>>> (pixel == [0,0,0]).all()
True

So this is our indicator that a single pixel is black. But we need to check if any pixel is black inside our block. Let's go down to a single channel image for simplicity first. Suppose we have the array

M = np.array([[0,1], [2,3]])

If we used a logical comparison here, M == 5, we would return an array of booleans, the same shape as M, comparing each element to 5:

>>> M == 5
array([[False, False] [False, False]])

In our case, we don't need to know every comparison, we just want to know if a single pixel inside the block is black, so we just want a single boolean. We can use .any() to check if any value was True inside M:

>>> (M == 5).any()
False

So we need to combine these two operations; we'll make sure that all the values match our color of interest ([0,0,0]) in order to count that pixel, and then we can see if any of our pixels returned True from that comparison inside each block:

black_pixel_in_block = [(block==[0,0,0]).all(axis=2).any() for block in blocks]

Note the axis=2 argument: .all(axis=2) will reduce the multi-channel image into a single channel of booleans; True at a pixel location if the color matched in every channel. And then we can check if any of the pixel locations returned true. This reduces to a boolean for each block, telling if it contained the color. So we can set the dictionary values to True or False depending on whether or not a black pixel was found:

for i in range(len(switch_list)):
    switch[switch_list[i]] = black_pixel_in_block[i]

And finally, print the result:

>>> print(switch)
{'upperRight': True, 
'upperLeft': True, 
'lowerRight': True, 
'lowerLeft': True, 
'middleLeft': True, 
'middleRight': True, 
'upperMiddle': True, 
'lowerMiddle': True, 
'middle': False}

The operations alone here took ~0.1 seconds on an (2140, 2870) image.

Along the same lines you could first create a matrix of True, False values for the whole image with .all(), and then split that up into blocks, and then use .any() inside the blocks. This would be better for memory since you're storing 9 (h,w) blocks instead of 9 (h,w,depth) blocks.

edited Jun 30 '17 at 16:51

answered Jun 28 '17 at 21:04

alkasm

22,094
5
78
94

Thank you. +1 for the good answer. I have one question. What is img in your context? Is it the array with the data that you get from the image? Do you use `np.array(img)`to get the data? Because I use the `.getdata()` method, in the data array that I mentioned in the comment.(and that has tuples as indexes) For h and w I used `img.size`. – Snow Jun 29 '17 at 07:28
If you indeed use `np.array` , then block is regarded as bool, and `.any()` would result in an attribute error. – Snow Jun 29 '17 at 07:40
1

No, I read the image in OpenCV, which reads them as a multidimensional numpy array. The same is true of many libraries, e.g. `scikit-image` and `scipy`'s `ndimage`. If you're using `PIL` it's simply `np.array(pic)` not `np.array(pic.getdata())`. [The answer here](https://stackoverflow.com/questions/384759/pil-and-numpy) shows reading an image into a numpy array with `PIL`. Or check out the [`ndimage` docs from `scipy`](http://www.scipy-lectures.org/advanced/image_processing/). – alkasm Jun 29 '17 at 10:09
Yes, I read the data using `np.array(img)`, and as I mentioned in the comment above: `.any()` results in an `Attribute error: 'bool' object has no attribute 'any'`. You did not get an error? – Snow Jun 29 '17 at 11:15
1

@JohnSmith Oh I see. `.any()` is a method for a list of `bool`s, not for a single `bool`. I don't know why you're getting a single `bool` from that comparison. Could you update your original post with your code that produces that error? – alkasm Jun 29 '17 at 11:22
I edited the post. So for example, an entry in the `imgarr` would be [0 0 0 255] – Snow Jun 29 '17 at 11:38
1

Well then you should be comparing `block` to a four element vector! :) My image was just RGB, not RGBA, so I only had three values. However, I just noticed an error in my code: `.any()` means if ANY of the r,g,b values matched 0; you should use `.all(axis=2)` to make sure that you're only marking it true when the pixel color is identical in all channels, and *then* use `.any()` to see if any pixel did match along all three channels. See the edited line with comment in the code on my post. You can remove the code from your original post if you want (and if yours is working now). – alkasm Jun 29 '17 at 12:28
so in your case `.all(axis=2)` makes sure that three first channels are identical? And `.any()` returns the boolean value that the 3 channels have? I am sorry for asking these many times, but `.all(axis=2).any()` is a bit confusing to me – Snow Jun 29 '17 at 13:34
I am asking, because how it is now, the result is different from the expected result. If for example the image is black in the `upperMiddle`, the console prints `lowerMiddle`. Or if it is `middleRight`, it prints `middleLeft`. – Snow Jun 29 '17 at 14:00
1

Oops. I ordered the labels incorrectly (or constructed the blocks the wrong direction---whichever way you want to look at it). My example didn't catch that because all but the middle were correct. Super easy fix to `switch_list` (just transposing it) would work. Editing my post now. And yes your understanding of the boolean bit is correct. – alkasm Jun 29 '17 at 18:33
Perfect, now it's almost all correct, except that after some testing, I noticed that another key, `middlemiddleRight` , is also printed in the end. (even though that key doesn't exist in the dictionary) Do you have any idea why this is occurring? I also noticed that the 10th key's name changes with different images. I can make the program print only the 9 first keys, but I'm curious to know why is there a 10th key created. – Snow Jun 29 '17 at 20:26
*sigh* I forgot the comma in the list. This is what happens when I write code while on-the-go. – alkasm Jun 29 '17 at 21:26
Now it's perfect. And it's 3 times faster! I still don't understand well one line of the code, but I will spend more time to get it. Answer accepted :) – Snow Jun 30 '17 at 09:02
Yeah, does your timing include reading the image? Because that's probably the bulk of your time spent---these calculations should be on the order of tens or hundreds of ms. – alkasm Jun 30 '17 at 13:54
Yes, the timing actually includes all my script, which has much more image loading and writing, and some other calculations. So, for this part of the code, it's probably much more than 3 times faster. – Snow Jun 30 '17 at 13:57
1

Okay, I calculated just that part of the code, and it's actually **67.8** times faster haha. I wish I could give you more than one upvote :) – Snow Jun 30 '17 at 14:05
1

@JohnSmith alright I updated the post with a whole lot of comments to walk through each piece of code. I hope that explained everything well for you and others that stumble upon this. – alkasm Jun 30 '17 at 14:35
I'll have to wait a day before I award the bounty to you, I don't actually need a better answer than this :) – Snow Nov 30 '17 at 10:45
@JohnSmith Cool! I forgot about this answer! – alkasm Nov 30 '17 at 10:48
yea, I didn't have enough rep at the time you answered – Snow Nov 30 '17 at 10:51
1

@JohnSmith looking back at this I actually can think of an even faster way to do this I think, but I'll have to tackle it tomorrow. :) – alkasm Nov 30 '17 at 13:08

score 1 · Answer 2 · answered Jun 28 '17 at 17:38

You can avoid the whole if/elif/else mess in this case, since your conditions can be replaced by a direct calculation of the appropriate key into the dictionary from the i and j values.

for i in range(first_number):
    horizontal_third = i * 3 // first_number 
    for j in range(second_number):
        if pixel_is_black(i, j): # I assume this depends on the coordinates in some way
            vertical third = j * 3 // second_number
            key = str(horizontal_third * 3 + vertical_third + 1) # use int keys? zero-index?
            switch[key] = True

Note that it would probably be a good idea to give more meaningful names to some of your variables. first_number might become width and second_number could become height (or vise versa) and the i and j values could become x and y (though the latter two are less bad, since i and j are pretty traditional as loop variables).

If you were willing to change switch a bit to further improve performance, you could replace the dictionary with a list, using zero-based integers as indexes. You'd just need to remove the + 1 and str calls from the key calculation (I'd also rename the variable index). You can initialize the list with switch = [False] * 9.

Thank you. +1 for making the code faster. However, I am not really using integers as keys. I didn't anticipate an answer that would use the keys of the dictionary to improve the speed; I am sorry for that. I am going to edit the code with the real keys. And indeed, the variables are named width and height, in the real code :) — Snow, Jun 28 '17 at 18:22
You can probably make it work with your arbitrary dict keys, just add another layer of indirection. Use the number I calculate in my code to look up the key (in a list or other dict) and then use that key to index your main dictionary. — Blckknght, Jun 28 '17 at 19:00

Improve CPU time of conditional statement

2 Answers2

Linked