1

I want to reduce the runtime of my algorithm. Can anyone suggest ways to reduce its complexity other than using threads or parallel computing.

Algorithm: This algorithm aims to solve Pearson's r where the function is accepting a matrix X, a vector y, and (m, n) sizes. Currently, I am using nested loops to solve the problem but I am trying to find a solution to improve the runtime of this algorithm as currently in theoretical time complexity this requires O(mxn) time. which is when n and m is equal to 100,000 or more, my computer lags or crashes as n and m increases.

def pearson_cor(X, y, m, n):
    v = []
    for i in range(m):
        sum_X = 0
        sum_y = 0
        sum_Xy = 0
        sqSum_X = 0
        sqSum_y = 0
        
        for j in range(n):
            sum_X = sum_X + X[i][j]
            
            sum_y = sum_y + y[j]
            
            sum_Xy = sum_Xy + (X[i][j] * y[j])

            sqSum_X = sqSum_X + X[i][j]**2
            sqSum_y = sqSum_y + y[j]**2

        v.append((float)((n * sum_Xy) - (sum_X) * (sum_y))/(float)(isqrt(((n * sqSum_X) - sum_X**2)* ((n * sqSum_y) - sum_y**2),2)))
    return v
Gie Grajo
  • 197
  • 1
  • 8
  • 3
    If you need to read every element of an m by n matrix, you can't beat O(mn) time. Using numpy/scipy is likely to be a lot faster though: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html – Paul Hankin Sep 25 '22 at 12:04
  • 1
    If the matrix is sparse, you can store it another way. – Nelfeal Sep 25 '22 at 12:23
  • @PaulHankin , I've tried the scipy one and this is so much faster but similar results with my formula. like from 12 seconds to 0.19 when n = 2000. That was awesome, I just change the innerloop to scipy function. Do you know why this happen? – Gie Grajo Sep 25 '22 at 12:28
  • 2
    Yes, it converts your python matrix and vector into an array of doubles which can be operated on efficiently by the CPU, and then uses optimized C functions to perform vectorized math. If you're doing mathsy code, you should use numpy yourself. – Paul Hankin Sep 25 '22 at 12:36
  • @PaulHankin, Regarding this one ```it converts your python matrix and vector into an array of doubles which can be operated on efficiently by the CPU``` can I replicate it myself to my code? – Gie Grajo Sep 25 '22 at 12:41
  • 1
    Yes, you can use numpy to store your vectors and matrices rather than native python lists. And you'll have to rewrite your code to use vectorized operations -- it's a big topic (too large for a stackoverflow comment) -- but see any introductory docs on numpy. – Paul Hankin Sep 25 '22 at 12:44
  • 2
    There's a link to the scipy source in the first link I shared -- you can see exactly what it's doing (once you get past the 100 screenfuls of documentation). – Paul Hankin Sep 25 '22 at 12:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/248335/discussion-between-gie-grajo-and-paul-hankin). – Gie Grajo Sep 25 '22 at 18:11

0 Answers0