3

Edit

I created a similar question which I found more understandable and practical there: How to copy a 2D array (matrix) from python with a C function (and do some computer heavy computation) which return a 2D array (matrix) in python?

Original question

I want to use C in python to perform a computation an all entry of a big non square matrix of size n times m. I copied the code from the excellent tutorial there: https://medium.com/spikelab/calling-c-functions-from-python-104e609f2804. The code there is for a square matrix

I first compiled the c_sum.c script

$ cc -fPIC -shared -o c_sum.so c_sum.c

and then ran the python script:

$ python main.py

and that ran well. However if I set the values of n and m in the main.py to different values, I get a segmentation fault. I guess one has to allocate memory separately for n and m but my knowledge of C is to rudimentary to know how to do it. What would be a code that would work with, let's say, m=3000 and n=2000?

Here are the script c_sum.c:

#include <stdlib.h>

double * c_sum(const double * matrix, int n, int m){
    double * results = (double *)malloc(sizeof(double) * n);
    int index = 0;
    for(int i=0; i< n*m; i+=n){
        results[index] = 0;
        for(int j=0; j<m; j++){
            results[index] += matrix[i+j];
        }
        index += 1;
    }
    return results;
}

Here is the main.c script:

# https://medium.com/spikelab/calling-c-functions-from-python-104e609f2804
from ctypes import c_void_p, c_double, c_int, cdll
from numpy.ctypeslib import ndpointer
import numpy as np
import time
import pdb

def py_sum(matrix: np.array, n: int, m: int) -> np.array:
    result = np.zeros(n)
    for i in range(0, n):
        for j in range(0, m):
            result[i] += matrix[i][j]
    return result

n = 3000
m = 3000
matrix = np.random.randn(n, m)

time1 = time.time()
py_result = py_sum(matrix, n, m)
time2 = time.time() - time1
print("py running time in seconds:", time2)
py_time = time2

lib = cdll.LoadLibrary("c_sum.so")
c_sum = lib.c_sum
c_sum.restype = ndpointer(dtype=c_double,
                          shape=(n,))

time1 = time.time()
result = c_sum(c_void_p(matrix.ctypes.data),
            c_int(n),
            c_int(m))
time2 = time.time() - time1
print("c  running time in seconds:", time2)

c_time = time2
print("speedup:", py_time/c_time)

ecjb
  • 5,169
  • 12
  • 43
  • 79

1 Answers1

1

I assume you want to compute sum along last axis for a (n,m) matrix. Segmentation fault occurs when you access memory which you have no access. The issue lies in the the erroneous outer loop. You need to iterate over both dimensions but you iterate over same dimension twice.

double * results = (double *)malloc(sizeof(double) * n); /* you allocate n doubles. 
         Do you free this Outside function? If not, you are having a memory leak. 
        An alternative way is to pass the output array to function, so that you can avoid creating memory in the function*/

for(int i=0; i< n*m; i+=n){ /* i+=n => you are iterating for m times. also you are iterating over last dimension */

        results[index] = 0; /* when index > n ; you are accessing data which 
                           you don't have access leading to segmentation fault */
        for(int j=0; j<m; j++) /* you are iterating again over last axis*/
        {
            results[index] += matrix[i+j];
        }

        index += 1; /* this leads to index > n as you iterate for m times and m>n in this case.
                   For a square matrix, m=n, so you don't have any issue */
    }

TLDR: To fix the segmentation fault, you need to replace for(int i=0; i< n*m; i+=n) with for(int i=0; i< n*m; i+=m) so that you only iterate for n times and over both dimensions.