I want to calculate the cM between two different windows along a chromosome. My code has three nested loops. For sample, I use random number stand for the recombination map.
import random
windnr = 54800
w, h = windnr, windnr
recmatrix = [[0 for x in range(w)] for y in range(h)]
#Generate 54800 random numbers between 10 and 30
rec_map = random.sample(range(0, 30), 54800)
for i in range(windnr):
for j in range(windnr):
recmatrix[i][j] = 0.25 * rec_map[i] #mean distance within own window
if i > j:
recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean rdistance final window
for k in range(i-1,j,-1):
recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
if i < j:
recmatrix[i][j] = recmatrix[i][j] + 0.5 * rec_map[j] #+ mean distance final window
for k in range(i+1,j):
recmatrix[i][j] = recmatrix[i][j] + rec_map[k] #add all windows between i and j
#j += 1
if i % 10 == 0:
print("window {}".format(i))
#i += 1
The calculation costs a lot of time. I have to calculate almost 7 days for my data.
Can I speed up the nested for loop within 10 hours?
How can I increase the performance?
Although the 2D array has 3 billion items (~96 GB when being floats), I would rule out hard disk swapping issues, since the server which does the computation has 200 GB of RAM.