0

I am a computer science student and some of the things I do require me to run huge loops on Macbook with dual core i5. Some the loops take 5-6 hours to complete but they only use 25% of my CPU. Is there a way to make this process faster? I cant change my loops but is there a way to make them run faster?

Thank you

Mac OS 10.11 Python 2.7 (I have to use 2.7) with IDLE or Spyder on Anaconda

Here is a sample code that takes 15 minutes:

def test_false_pos():
    sumA = [0] * 1000
    for test in range(1000):
        counter = 0
        bf = BloomFilter(4095,10)
        for i in range(600):
            bf.rand_inserts()
        for x in range(10000):
            randS = str(rnd.randint(0,10**8))
            if bf.lookup(randS):
                counter += 1
        sumA[test] = counter/10000.0
    avg = np.mean(sumA)
    return avg
tTIKA
  • 73
  • 1
  • 10
  • do you have 4 processor? – Steven G Oct 07 '16 at 19:03
  • @StevenG: (S)he said, "Macbook with dual core i5". – Fred Larson Oct 07 '16 at 19:06
  • @tTIKA please learn more about `Iterators` and `Generators` from [here](http://pymbook.readthedocs.io/en/latest/igd.html) – Some programmer Oct 07 '16 at 19:10
  • dual core with hyper threading - 4 cores. which is why 25%. but yeah, in order to make this faster you can try a couple of things: `multiprocessing` or play around with `numpy` – acushner Oct 07 '16 at 19:12
  • No its not hyperthreded. Only physical 2 cores – tTIKA Oct 07 '16 at 19:34
  • @Avi I skimmed through the link but cant really understand what they are. Can u explain it please? – tTIKA Oct 07 '16 at 19:51
  • basically `a generator is a special routine that can be used to control the iteration behaviour of a loop`; and for iterator, have a look at this [`link`](https://www.codementor.io/python/tutorial/python-generators-and-iterators) – Some programmer Oct 07 '16 at 19:59

3 Answers3

4

Sure thing: Python 2.7 has to generate huge lists and waste a lot of memory each time you use range(<a huge number>).

Try to use the xrange function instead. It doesn't create that gigantic list at once, it produces the members of a sequence lazily.


But if your were to use Python 3 (which is the modern version and the future of Python), you'll find out that there range is even cooler and faster than xrange in Python 2.

ForceBru
  • 43,482
  • 10
  • 63
  • 98
  • 3
    `xrange` saves *memory*, it's not necessarily going to speed up execution. You still have to call `next` on each iteration; that call just computes the next value rather than pulling it from a list. – chepner Oct 07 '16 at 19:10
  • @chepner, well, even allocating such a tremendous amount of memory and filling it with a lot of data just to loop over it (this is what Python 2's `range` does) wastes much time. – ForceBru Oct 07 '16 at 19:12
  • 2
    10000 integers is not a "tremendous" amount of memory. The question is how much time does each Bloom filter operation take relative to the loop overhead. As with anything, benchmark to find out. – chepner Oct 07 '16 at 19:13
  • I just tried just the loops of the 15 minutes sample code (with `pass` in them). Took less than one second. You really think that's noticeable at all? – Stefan Pochmann Oct 07 '16 at 19:34
  • To be more precise: It took about 0.17 seconds with `range` and 0.13 seconds with `xrange`. That's only 0.04 seconds improvement. – Stefan Pochmann Oct 07 '16 at 19:47
  • 1
    @StefanPochmann How did you tried it with out having my Bloom Filter class? I though that operation was the expensive part. – tTIKA Oct 07 '16 at 19:47
  • @tTIKA Exactly like I described. – Stefan Pochmann Oct 07 '16 at 19:56
3

You could split it up into 4 loops:

import multiprocessing

def test_false_pos(times, i, q):
    sumA = [0] * times
    for test in range(times):
        counter = 0
        bf = BloomFilter(4095,10)
        for i in range(600):
            bf.rand_inserts()
        for x in range(10000):
            randS = str(rnd.randint(0,10**8))
            if bf.lookup(randS):
                counter += 1
        sumA[test] = counter/10000.0
    q.put([i, list(sumA)])

def full_test(pieces):
    threads = []
    q = multiprocessing.Queue()
    steps = 1000 / pieces
    for i in range(pieces):
        threads.append(multiprocessing.Process(target=test_false_pos, args=(steps, i, q)))
    [thread.start() for thread in threads]
    results = [None] * pieces
    for i in range(pieces):
        i, result = q.get()
        results[i] = result
    # Flatten the array (`results` looks like this: [[...], [...], [...], [...]])
    # source: https://stackoverflow.com/a/952952/5244995
    sums = [value for result in results for val in result]
    return np.mean(np.array(sums))

if __name__ == '__main__':
    full_test(multiprocessing.cpu_count())

This will run n processes that each do 1/nth of the work, where n is the number of processors on your computer.

The test_false_pos function has been modified to take three parameters:

  • times is the number of times to run the loop.
  • i is passed through to the result.
  • q is a queue to add the results to.

The function loops times times, then places i and sumA into the queue for further processing.

The main thread (full_test) waits for each thread to complete, then places the results in the appropriate position in the results list. Once the list is complete, it is flattened, and the mean is calculated and returned.

Jed Fox
  • 2,979
  • 5
  • 28
  • 38
1

Consider looking into Numba and Jit (just in time compiler). It works for functions that are Numpy based. It can handle some python routines, but is mainly for speeding up numerical calculations, especially ones with loops (like doing cholesky rank-1 up/downdates). I don't think it would work with a BloomFilter, but it is generally super helpful to know about.

In cases where you must use other packages in your flow with numpy, separate out the heavy-lifting numpy routines into their own functions, and throw a @jit decorator on top of that function. Then put them into your flows with normal python stuff.

Mansweet
  • 161
  • 7