C optimize algorithm with huge double array loaded from mysql

Question

I am loading values from mysql into array and it takes around 20 minutes. dictionary table got 3 columns word1 (INT), word2(INT), distance(double) word1 and word1 got values from 0 to 57999 some values of word2 are skipped because there are only 180 million rows (not 58000 * 58000)

int i;
double **word;
word=(double **) calloc(58000, sizeof(double *));
for(i=0;i<58000;i++)
    word[i]=(double *) calloc(58000, sizeof(double));

for(word1=0;word1<58000;word1++) {
        char query[600];
        sprintf(query, "SELECT word1, word2, distance from dictionary where word1='%d'", жорд1);
        mysql_query(con, query);
        result = mysql_store_result(con);
        while (row = mysql_fetch_row(result)) {
            double distance;
            word2 = atoi(row[1]);
            sscanf(row[2], "%lf", &distance);
            word[word1][word2] = distance; 
        }
        mysql_free_result(result);
}

This array takes 14 gb of ram and 7gb of virtual memory. Now I can use this array for further calculations.

for(word1=0;word1<58000;word1++) {

    for(word2=0;word2<58000;word2++) {
        if(word1 != word2) {
            double result[58000];
            for(i=0;i<58000;i++) {
                result[i] += pow((word[word1][i] - word[word2][i]), 2);
            }
            mysql_query(con, "insert result to mysql");
        }
    }
}

Many of array values are 0 But I need array keys to identify which words I am comparing. How can I improve this algorithm? or How do I load this array into memory once and reuse it next time I execute the program?

Why does the second snippet only use result[word1], ignoring other 57999 elements you allocate? — gus, May 23 '14 at 14:15
This seems like something that could be done entirely from MySQL without the need to bring C into the mess. For example, something like this: https://stackoverflow.com/questions/664700/calculate-a-running-total-in-mysql — Mr. Llama, May 23 '14 at 14:25
Lets assume there are more calculations and it cant be done in mysql — valerij vasilcenko, May 23 '14 at 14:46
Is your problem the load time (which involves 58000 queries) or the computation time (which will be dominated by page-fault handling, since you don't have enough physical memory for the entire array)? — rici, May 23 '14 at 15:23
Just a little note: [do **not** cast the result of `[c|m]alloc()` in C](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). — Crozin, May 23 '14 at 15:39
Have switched on compiler optimization? It should convert the otherwise slow squaring by `pow(x,2)` to an inline multiplication. — Peter G., May 23 '14 at 15:45

C optimize algorithm with huge double array loaded from mysql

0 Answers0