how to effectively loop over matrix elements?

Question

I have to loop over a 800 000* 800 000 matrix. I tried to do that by simple loops but it take me so huge time. How can I loop fastly ?

for in in xrange(800000):
   for j in xrange(800000):
      print i,j

Typically, I am reading an image using OpenCV, then I need to loop over every pixel in order to perform some calculations and comparisons with the values of certain characteristics of the neighbors of the pixel. When I run a loop, I feel it needs more than 2 days to be finished.

In fact, I want to implement my own version of GrowCut algorithm. The authors claim to execute the algorithm in less than 4 minutes using a computer like mine. However, looping over 1200*1100 matrix takes so much time (I tested that). How can I read them quickly ?

What languages are you comparing here? You won't be able to beat C speed with python. No matter how hard you try :) — cel, Feb 19 '15 at 15:27
Also could you try and isolate a piece of code where the loop is in some sort of a standalone (even if you have to cut some of the operations out). Python loops work degeneratively slow almost, however it could also indicate an issue with your code. Perhaps looping over pixels which have a value over a treshold can help (with np.where) etc... — ljetibo, Feb 19 '15 at 15:32
@ljetibo I need absolutely to read the RGB values of each pixel and compare them with the values of the neighbors of that pixel (check the link I gave, if you want ... the person programmed it in Python too, however I get errors when running it) — , Feb 19 '15 at 15:35
Have you seen: http://continuum.io/blog/numba_growcut They provide an implementation and claim that Numba speeds up GrowCut from 8hrs to under a minute. — tom10, Feb 19 '15 at 16:02
@tom10 yes, you are right in what you say. I just have troubles to install Numba on Windows 7. I am thinking to move on Liunx (Ubuntu) because it seems it is easy to install numba on it — , Feb 19 '15 at 16:12
The free anaconda version includes Numba, so that would be one easy way to give it a try. It might mess up the path to your current distribution, but mostly it sits off to the side. And for doing the type work you are doing, anaconda probably has lots of other good tools for you. Personally, I installed it as a test and it's now mostly all I use. (Even if you go to Linux you might want to consider it.) — tom10, Feb 19 '15 at 16:24
@tom10 Yes, Anaconda has many interesting tools, however it does not have Numba ([link](http://docs.continuum.io/anaconda/index.html) to available tools within it) — , Feb 19 '15 at 16:36
Try this link, that lists all of the (default) packages: http://docs.continuum.io/anaconda/pkg-docs.html Numba is included. Plus, I know I have it in my (OSX) Anaconda distribution. — tom10, Feb 19 '15 at 16:41
@tom10 wow !! very nice !!! I did not see that ! How can I call to OpenCV from Anaconda ? (it is not included in it, so I need to install it separately) -P.S. If you think this is a question to ask separetely, please let me know — , Feb 19 '15 at 16:51
For OpenCV, personally I looked here: https://binstar.org/search?q=opencv (binstar is a place to share anaconda packages) but there's also this: http://stackoverflow.com/questions/23119413/how-to-install-python-opencv-through-conda From my memory OpenCV is much easier to install on Windows than OSX so you'll have a few options. — tom10, Feb 19 '15 at 16:58
@tom10 you have been of a great help to me. I have a question about conda issue within anconda. I asked it as a different question. — , Feb 19 '15 at 17:10
I see your new question, but unfortunately I can't help much because I don't have Windows at the moment. All I can say is that from my experience getting Anaconda working correctly is likely worth any effort it might take: from your question it looks like it's going to take you a bit of time to get the paths right, but there are numerous packages in the scientific stack that are difficult to install and Anaconda makes most of this work dramatically easier. — tom10, Feb 19 '15 at 18:03
@tom10 I understand and I thank you so much, you helped me a lot already. I resolved the problems of path, I have just an other issue that I will try to know why. Best regards, nice man — , Feb 19 '15 at 18:05

Tui Popenoe · Answer 1 · 2015-02-19T15:44:27.713

1

For performant array looping, you can use Cython. You can use most of the syntax of Python, with a lot of the performance gains of using C. It is also compatible with NumPy.

Iterating over arrays with Cython.

edited Feb 19 '15 at 15:44

answered Feb 19 '15 at 15:30

Tui Popenoe

2,098
2
23
44

2

That's more a comment than a real answer to the problem :) – cel Feb 19 '15 at 15:32

Tarik · Answer 2 · 2015-02-19T15:55:34.173

0

You may consider using the built in bytearray type instead of python lists.

You can create a bytearray of size 1200 * 1300 * 4 to represent your matrix. Element i, j will be at i * 1200 * 4 + j * 4 (assuming a 4 byte pixel size)

Looking further, I found that you can use arrays in python to efficiently store pretty much anything efficiently. You can easily calculate the index as explained above.

"The Numeric Python extension (NumPy) defines another array type; see http://www.numpy.org/ for further information about Numerical Python."

edited Feb 19 '15 at 15:55

answered Feb 19 '15 at 15:37

Tarik

10,810
2
26
40

thank you. I really need to loop the matrix element by element (given the nature of GrowCut algorithm) – Feb 19 '15 at 15:39

ljetibo · Answer 3 · 2015-02-19T15:52:37.417

-1

Here's a possible optimization, however as again this will be dependent on the system and other currently running processes etc... and the calculations which you did inside.

import time
import numpy as np
test = np.ones((1200, 1100))

Test 1:

def loops():
    start = time.clock()
    for i in range(1200):
        for j in range(1100):
            a = test[i,j]
    print(time.clock()-start)
    return a

>>> loops()
1.433313120652599
1.0

so you see it's only 1.5 seconds for the loops. Adding calculations inside will stretch this time out significantly, but your loops are the least of that worry. Without providing your code I can't say much more about that.

Alternate approach would be:

def loops():
    start = time.clock()
    for row in test:
        for element in row:
            a = element
    print(time.clock()-start)
    return a

>>> loops()
0.714938339920252
1.0

Do notice how small this sample is (just 1 test) however half the time is indicative of possible enhancements? also you avoid the need to use the a= and can skip allocation of a new variable which should shave off few microseconds.

Show your code, I'm sure there's something else that can be done.

This is win7 Intel i5 Dual Core Python3.4

edited Feb 19 '15 at 15:52

answered Feb 19 '15 at 15:46

ljetibo

3,048
19
25

a quick look at the algorithm link shows 4 nested loops; maybe that's why the OP's time is so much longer than yours? – tom10 Feb 19 '15 at 15:53
@tom10: It's unrealistic to expect this mock test should be even close to the OPs question. However it does show that the loops with `in` statement seem to have faster lookup times. I believe it's not his loops that are causing him issues, it's the calculations he does inside. However I'm unfamiliar with GrowCut. Without explanation and his exact code I can't help much. – ljetibo Feb 19 '15 at 15:57
@tom10 That's quite alright, but as far as I know he could have made a simple mistake on any of the inside loops (whose meaning I can't make out from the code he posted) and could be running it more than necessary. If he used `range` and not `xrange` he's actually creating entire lists if he's using old python. `xrange` is a iterator which is faster. He could have written his own generator perhaps, etc etc... I can't optimize something I don't see. – ljetibo Feb 19 '15 at 16:07
thank you very much. So fast in comparaison to my case. Can you explain me what you did, please ? – Feb 19 '15 at 16:14
1

Sure, but I originally upvoted because running the loops on their own puts a reasonable lower bound on the timing, but looking at the GrowCut algorithm, I then realize your test isn't really relevant. If you ran four nested loops it would be though, imho, at least it would be off probably by a factor of two or ten, but not a *power* of two, ie, in this case probably millions. (I do wish though, there were a way to remove my upvote without making a downvote.) – tom10 Feb 19 '15 at 16:19
@poor: Did this really help you? In your actual code? Or are you just looking at the times I posted in my answer. Because they won't be the same for you. Basically `for __ in __` works the same as `for i in range(start, length_of_object): x = object[i]` on the surface. Bellow however python interpreter can optimize the `for` statement itself. I believe it comes down to the fact that Python calls a `__getitem__` on your objects and treats the entire block of code of `for` loop as an inline statement. This gets rid of a lot of overhead. – ljetibo Feb 19 '15 at 16:23

how to effectively loop over matrix elements?

3 Answers3