How to speed up nested for loop in python

Question

I want to calculate the cM between two different windows along a chromosome. My code has three nested loops. For sample, I use random number stand for the recombination map.

import random

windnr = 54800
w, h   = windnr, windnr
recmatrix = [[0 for x in range(w)] for y in range(h)]

#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)

for i in range(windnr):
    for j in range(windnr):
        recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window
        if i > j:
            recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean rdistance final window
            for k in range(i-1,j,-1):
                recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
        if i < j:
            recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean distance final window
            for k in range(i+1,j):
                recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
        #j += 1
    if i % 10 == 0:
        print("window {}".format(i))
    #i += 1

The calculation costs a lot of time. I have to calculate almost 7 days for my data.
Can I speed up the nested for loop within 10 hours? How can I increase the performance?

Although the 2D array has 3 billion items (~96 GB when being floats), I would rule out hard disk swapping issues, since the server which does the computation has 200 GB of RAM.

Some explanation of what the purpose of your algorithm is would help. — D Malan, Jul 16 '20 at 14:08
How fast is it now? How fast does it need to be? For every performance improvement, ask yourself these two questions. — Thomas Weller, Jul 16 '20 at 14:11
"Can I speed up the nested for loop?" - almost always, unless it has been highly optimized already. — Thomas Weller, Jul 16 '20 at 14:13
Side-note: Remove your `j += 1` and `i += 1` lines; they're immediately ignored (being replaced by the next `i` or `j` to come out of the `range`). — ShadowRanger, Jul 16 '20 at 14:15
@ThomasWeller: It's a lot more than 24 GB, given it's a `list` of `list`s, not 2D `numpy` array. The `list` of `list`s would *start* at around 24 GB, but as the values become hand-computed `float`s, the `float`s would each occupy 24 bytes on top of the 8 byte pointer storage, so total usage would quadruple to 96 GB. This code needs `numpy`, both to reduce memory usage/fragmentation and to push a lot more work to optimized C loops. — ShadowRanger, Jul 16 '20 at 14:34
@ShadowRanger: thanks. I indeed only had a look at the initial memory of the 2D lists — Thomas Weller, Jul 16 '20 at 14:36
I tried your code on 2 different machines, but I always get `ValueError("Sample larger than population or is negative")`. — Thomas Weller, Jul 16 '20 at 17:31
Did you mean `rec_map = random.choices(range(0, 30), k=54800)`? — Thomas Weller, Jul 16 '20 at 17:52
Did you consider writing this very piece of code in a fast compiled language, like C++ ? I'm not an early optimization advocate, but this definitely seems like a hell of a bottleneck, and Python has so many nice and simple ways to plug some cpp code in it. — m.raynal, Jul 16 '20 at 20:00

Thomas Weller · Accepted Answer · 2020-07-16T20:00:11.287

Using Numpy will make your application much faster. It's written in C/C++, so it does not suffer from slow loops in Python.

I'm doing my tests on an old Intel Xeon X5550 with 2 sockets, 8 cores and 96 GB of triple channel RAM. I don't have much experience with Numpy, so bear with me, if below code is not optimal.

Array initialization

Already the initialization is much faster:

recmatrix = [[0 for x in range(w)] for y in range(h)]

needs 24 GB of RAM (integers) and takes 3:28 minutes on my PC. Whereas

recmatrix = np.zeros((windnr, windnr), dtype=np.int)

is finished after 50 ms. But since you need floats anyway, start with floats from the beginning:

recmatrix = np.zeros((windnr, windnr), dtype=np.float)

Random samples

The code

#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)

did not work for me, so I replaced it and increased k for more stable measurements

rec_map = random.choices(range(0, 30), k=5480000)

which runs in 2.5 seconds. The numpy replacement

rec_map = np.random.choice(np.arange(0, 30), size=5480000)

is done in 0.1 seconds.

The loop

The loop will need most work, since you'll avoid Python loops in Numpy whenever possible.

For example, if you have an array and want to multiply all elements by 2, you would not write a loop but simply multiply the whole array:

import numpy as np

single = np.random.choice(np.arange(0, 10), size=100)
doubled = single * 2
print(single, "\r\n", doubled)

I don't fully understand what the code does, but let's apply that strategy on the first part of the loop. The original is

for i in range(windnr):
    for j in range(windnr):
        recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window

and it takes 18.5 seconds with a reduced windnr = 5480. The numpy equivalent should be

column = 0.25 * rec_map_np
recmatrix = np.repeat(column, windnr)

and is done within 0.25 seconds. Also note: since we're assigning the variable here, we don't need the zero initialization at all.

For the if i>j: and if i<j: parts, I see that the first line is identical

recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j]

That means, this calculation is applied to all elements except the ones on the diagonal. You can use a mask for that:

mask = np.ones((windnr, windnr), dtype=bool)
np.fill_diagonal(mask, False)
rec_map_2d = np.repeat(0.5 * rec_map_np, windnr-1)
recmatrix[mask] += rec_map_2d

This took only 1:20 minutes for all 54800 elements, but reached my RAM limit at 93 GB.

score -1 · Answer 2 · answered Jul 16 '20 at 14:27

-1

Usually in python looping always take much time. So if possible then in your case then use map this will save a lot of time for you. Where you are using a iter(list) so it will be good for this script.

example:

def func():
    your code
nu = (1, 2, 3, 4) 
output =  map(func, nu)
print(output)

answered Jul 16 '20 at 14:27

Dip Ghosh

49
5

1

`map` is a microoptimization, one which provides no real benefit when the mapping function is implemented in Python, and a fairly small (<10% boost) even when the mapping function is a built-in implemented in C (the only time to consider `map`). And your example would only work on Python 2 anyway (on Python 3, it's lazy, so `print` would just tell you it's a `map` object without having run the mapping function even once). – ShadowRanger Jul 16 '20 at 14:30

How to speed up nested for loop in python

2 Answers2

Array initialization

Random samples

The loop