How to make this code faster

Question

The following piece of code executes 20 million times each time the program is called so I need a way to make this code as optimized as possible. I am not an advanced c++ programmer so I seek this incredible community's help.

int WilcoxinRST::work(GeneSet originalGeneSet, vector<string> randomGenes) {
vector<string> geneIDs;
vector<bool> isOriginal;
vector<int> rank;
vector<double> value;
vector<int> score;
int genesPerSet = originalGeneSet.geneCount();
unsigned int totalGenes, tempScore;
/**
 * Fill the first half of the vectors with original gene set data
 */
totalGenes = genesPerSet * 2;
for (int i = 0; i < genesPerSet; i++) {
    geneIDs.push_back(originalGeneSet.getMemberGeneAt(i));
    isOriginal.push_back(true);
    value.push_back(fabs(expressionLevels.getValue(geneIDs[i], statType)));
}
/**
 * Fill the second half with random data
 */
for (unsigned int i = genesPerSet; i < totalGenes; i++) {
    geneIDs.push_back(randomGenes.at(i - genesPerSet));
    isOriginal.push_back(false);
    value.push_back(fabs(expressionLevels.getValue(geneIDs[i], statType)));
}
totalGenes = geneIDs.size();
/**
 * calculate the scores
 */
if (statType == Fold_Change || statType == T_Statistic
        || statType == Z_Statistic) {
    //      Higher value is a winner
    for (unsigned int i = 0; i < totalGenes; i++) {
        tempScore = 0;
        if (!isOriginal[i]) {
            for (int j = 0; j < genesPerSet; j++) {
                if (value.at(i) > value.at(j)) {
                    tempScore++;
                }
            }

        } else {
            for (unsigned int j = genesPerSet; j < totalGenes; j++) {
                if (value.at(i) > value.at(j)) {
                    tempScore++;
                }
            }

        }

        score.push_back(tempScore);
    }

} else if (statType == FDR_PValue || statType == PValue) {
    // Lower value is a winner
    for (unsigned int i = 0; i < totalGenes; i++) {
        tempScore = 0;
        if (!isOriginal[i]) {
            for (int j = 0; j < genesPerSet; j++) {
                if (value.at(i) < value.at(j)) {
                    tempScore++;
                }
            }

        } else {
            for (unsigned int j = genesPerSet; j < totalGenes; j++) {
                if (value.at(i) < value.at(j)) {
                    tempScore++;
                }
            }

        }

        score.push_back(tempScore);
    }

} else {
    cout << endl << "ERROR. Statistic type not defined." << endl;
}

/**
 * calculate Ua, Ub and U
 */
int U_Original = 0, U_Random = 0, U_Final;
for (int j = 0; j < genesPerSet; j++) {
    U_Original += score[j];
}
for (unsigned int j = genesPerSet; j < totalGenes; j++) {
    U_Random += score[j];
}
U_Final = (U_Original < U_Random) ? U_Original : U_Random;

/**
 * calculate z
 */
double Zn, Zd, Z;
Zn = U_Final - ((genesPerSet * genesPerSet) / 2);
Zd = sqrt(
        (double) (((genesPerSet * genesPerSet
                * (genesPerSet + genesPerSet + 1)))) / 12.0);
Z = Zn / Zd;

/**
 * Return 0/1/2
 * 2: p value < 0.01
 * 1: 0.01 < p value < 0.05
 * 0: p value > 0.05
 */
if (fabs(Z) > 2.303)
    return 2;
else if (fabs(Z) > 1.605)
    return 1;
else
    return 0;
}

So I assume you have benchmarked it and it turned out to be the primary bottleneck, right? — , Mar 28 '13 at 07:50
Not looked in detail but one "obvious" point stands out - if you know the expected size of a std::vector beforehand, resize() or reserve() first to avoid repeated memory reallocations. — Roger Rowland, Mar 28 '13 at 07:50
No, I use eclipse on ubuntu and I don't think there is a way to profile the code it this IDE. — Pranjal, Mar 28 '13 at 07:51
Only thing is when I run this code, to finish all the iterations it take about 165 minutes. — Pranjal, Mar 28 '13 at 07:52
It doesn't help you, but this looks like something you could rewrite in a map-reduce style to make it possible to run on a lot of machines in parallel. See http://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c for some pointers. — Paul Dixon, Mar 28 '13 at 07:55
@Pranjal Don't use `at`, use `[]`. `at` is slower because it does bounds checking, if you're sure you have the bounds correct use `[]`. `value[i]` not `value.at(i)` etc. — john, Mar 28 '13 at 07:55
Perhaps pass the objects into the function by reference. Saves copying potentially large objects. — john, Mar 28 '13 at 07:58
@Prajnal - only one other suggestion - you might get fewer downvotes if you ask on codereview - http://codereview.stackexchange.com/ — Roger Rowland, Mar 28 '13 at 08:00
@Pranjal. Im planing to work on uArr. Your code is interesant for me. I have it in my IDE, but it will take a bit before I post something back to you. — qPCR4vir, Mar 28 '13 at 08:12
thanks everyone, any input is much appreciated. Any suggestion on how to go about profiling the code so as to find the bottlenecks are also welcome. — Pranjal, Mar 28 '13 at 08:23
@Pranjal: Take a look [*here*](http://stackoverflow.com/a/927773/23771). When I see all those `push_back`s and `at`s, I know you have *lots* of room for speedup. — Mike Dunlavey, Mar 28 '13 at 12:28

qPCR4vir · Answer 1 · 2013-03-29T17:50:42.143

Your code have a complexity of O(N*N) [genesPerSet=N]. But using the fact that the order of the values is irrelevant for you we can order it in O(N•log(N)) and compute the “scores” in O(N). (With potentially can be thousands time faster).

Also, we have a total of N*N comparisons. Then U_Original + U_Random = N*N, meaning we don’t need to compute U_Random. Also your statistic Zn= Umin-N*N/2;[Umix=min(U_Original,U_Random)], when you only abs(Zn/Zd) is symmetric around N*N/2. We need only one algorithm.

1.- A first thing can be to take arguments by (const) reference :

int WilcoxinRST::work(const GeneSet &originalGeneSet, const vector<string> &randomGenes)

2.- You fill into vector geneIDs; but dont use it ? Why?

3.- You can iterate only 2 times.

4.- You keep the signal values (probe intensitat?) together in one vector and use another vector to signal what is each item – simple keep in two vector.

5.- You don’t need the score vector, only the summa.

6.- Why 20 millions times? I gess your are computing some “statistic” stability or BStrap. Probably you use the same originalGeneSet many time. I think you can in another question post the code that call this function to spare in making each time the vector of values and sorting.

Here is first the new O(N•log(N)) code.

What follows is a cleanup of your code but still O(N*N), fast but only by a constant factor.

Then the same code but mixed with yours original code and with more comments.

Please, debugg this and tell me how was.

#include<vector>
#include<algorithm>

int WilcoxinRST::work(const GeneSet &originalGeneSet , const vector<string>& randomGenes) 
{
    size_t genesPerSet = originalGeneSet.geneCount();
    std::vector<double> valueOri(genesPerSet), valueRnd(genesPerSet);
    /**
     * Fill the valueOri vector with original gene set data, and valueRnd with random data
     */
    for (size_t i = 0; i < genesPerSet; i++) 
    {
      valueOri[i] = std::fabs(expressionLevels.getValue( originalGeneSet.getMemberGeneAt(i) , statType ));
      valueRnd[i] = std::fabs(expressionLevels.getValue( randomGenes.at(i)                  , statType ));
    }
    std::sort(valueOri.begin(),valueOri.end());
    std::sort(valueRnd.begin(),valueRnd.end());

    /**
     * calculate the scores Ua, Ub and U
     */
    long U_Original ;

    if (statType == Fold_Change || statType == T_Statistic  || statType == Z_Statistic 
        statType == FDR_PValue  || statType == PValue ) 
    {
        //      Higher value is a winner
        size_t j=0;
        for (size_t i = 0; i < genesPerSet /*totalGenes*/; i++)      // i   -  2x
        {   
            while(valueOri[i]  > valueRnd[j]) ++j ;
            U_Original += j;
        }
    } else { cout << endl << "ERROR. Statistic type not defined." << endl;  }

    /**
     * calculate z
     */
    double Zn, Zd, Z;
    Zn = U_Original - ((genesPerSet * genesPerSet) / 2);
    Zd = std::sqrt( (double) (((genesPerSet * genesPerSet* (genesPerSet + genesPerSet + 1)))) / 12.0);
    Z = Zn / Zd;

    /**
     * Return 0/1/2
     * 2: p value < 0.01
     * 1: 0.01 < p value < 0.05
     * 0: p value > 0.05
     */
         if (std::fabs(Z) > 2.303)  return 2;
    else if (std::fabs(Z) > 1.605)  return 1;
    else                            return 0;
}

What follows is a cleanup of your code but still O(N*N), fast but only by a constant factor.

#include<vector>
using namespace std;
class GeneSet ;
class WilcoxinRST;

int WilcoxinRST::work(const GeneSet &originalGeneSet , const vector<string>& randomGenes) 
{
    size_t genesPerSet = originalGeneSet.geneCount();
    vector<double> valueOri(genesPerSet), valueRnd(genesPerSet);
    /**
     * Fill the valueOri vector with original gene set data, and valueRnd with random data
     */
    for (size_t i = 0; i < genesPerSet; i++) 
    {
        valueOri[i]   =  fabs(expressionLevels.getValue( originalGeneSet.getMemberGeneAt(i) , statType ));
        valueRnd[i]   =  fabs(expressionLevels.getValue( randomGenes.at(i)                  , statType ));
    }
    /**
     * calculate the scores Ua, Ub and U
     */
    long U_Original = 0, U_Random = 0, U_Final;

    if (statType == Fold_Change || statType == T_Statistic  || statType == Z_Statistic) 
    {
        //      Higher value is a winner
        for (size_t i = 0; i < genesPerSet /*totalGenes*/; i++)      // i   -  2x
        {   for (size_t j = 0; j < genesPerSet; j++)   
            {   U_Random  +=  (valueRnd[i]  > valueOri[j]);// i en 2 set=Rnd, j in 1 set=Ori. count how many Ori are less than this Rnd 
                U_Original+=  (valueOri[i]  > valueRnd[j]);// i in 1 set=Ori, j in 2 set=Rnd, count how many Rnd are less than this Ori 
            }
        }
    } else 
    if (statType == FDR_PValue || statType == PValue) 
    {
        // Lower value is a winner
        for (size_t i = 0; i < genesPerSet; i++)   
        {   
            for (size_t j = 0; j < genesPerSet; j++)   
            {   U_Random  +=  (valueRnd[i]  < valueOri[j]);// i en 2 set=Rnd, j in 1 set=Ori. count how many Ori are > than this Rnd 
                U_Original+=  (valueOri[i]  < valueRnd[j]);// i in 1 set=Ori, j in 2 set=Rnd, count how many Rnd are > than this Ori 
            }
        }
    } else { cout << endl << "ERROR. Statistic type not defined." << endl;  }


    U_Final = (U_Original < U_Random) ? U_Original : U_Random;

    /**
     * calculate z
     */
    double Zn, Zd, Z;
    Zn = U_Final - ((genesPerSet * genesPerSet) / 2);
    Zd = sqrt(
            (double) (((genesPerSet * genesPerSet
                    * (genesPerSet + genesPerSet + 1)))) / 12.0);
    Z = Zn / Zd;

    /**
     * Return 0/1/2
     * 2: p value < 0.01
     * 1: 0.01 < p value < 0.05
     * 0: p value > 0.05
     */
         if (fabs(Z) > 2.303)       return 2;
    else if (fabs(Z) > 1.605)       return 1;
    else                            return 0;
}

the same code but mixed with yours original code and with more comments.

int WilcoxinRST::work(const GeneSet &originalGeneSet , const vector<string>& randomGenes) 
{
    size_t genesPerSet = originalGeneSet.geneCount();
    unsigned int totalGenes, tempScore;
    totalGenes = genesPerSet * 2;

    //vector<string> geneIDs;
    //vector<bool>   isOriginal;
    //vector<int>    rank;
    vector<double> valueOri(genesPerSet), valueRnd(genesPerSet);
    //vector<int>    score;
    /**
     * Fill the first half of the vectors with original gene set data
     */

    for (size_t i = 0; i < genesPerSet; i++) 
    {
        //geneIDs.push_back( originalGeneSet.getMemberGeneAt(i)  );
        //isOriginal.push_back(true);

        valueOri[i]   =  fabs(expressionLevels.getValue( originalGeneSet.getMemberGeneAt(i) , statType ));
        valueRnd[i]   =  fabs(expressionLevels.getValue( randomGenes.at(i)                  , statType ));
    }
    /**
     * Fill the second half with random data
     */
    //for (unsigned int i = genesPerSet; i < totalGenes; i++) {
    //  geneIDs.push_back(randomGenes.at(i - genesPerSet));
    //  isOriginal.push_back(false);
    //  value.push_back(fabs(expressionLevels.getValue(geneIDs[i], statType)));
    //}
    //totalGenes = geneIDs.size();
    /**
     * calculate the scores
     */
        /**
     * calculate Ua, Ub and U
     */
    long U_Original = 0, U_Random = 0, U_Final;
    //for (int j = 0; j < genesPerSet; j++) // j in 1 set=Ori. count how many Ori are less than this Rnd
    //{
    //  U_Original += score[j];
    //}
    //for (unsigned int j = genesPerSet; j < totalGenes; j++) // j in 2 set=Rnd, count how many Rnd are less than this Ori 
    //{
    //  U_Random += score[j];
    //}

    if (statType == Fold_Change || statType == T_Statistic  || statType == Z_Statistic) 
    {
        //      Higher value is a winner
        for (size_t i = 0; i < genesPerSet /*totalGenes*/; i++)      // i   -  2x
        {   //tempScore = 0;
            //if (!isOriginal[i])  // i en 2 set=Rnd, j in 1 set=Ori. count how many Ori are less than this Rnd 
            for (size_t j = 0; j < genesPerSet; j++)   
            {   U_Random  +=  (valueRnd[i]  > valueOri[j]);// i en 2 set=Rnd, j in 1 set=Ori. count how many Ori are less than this Rnd 
                U_Original+=  (valueOri[i]  > valueRnd[j]);// i in 1 set=Ori, j in 2 set=Rnd, count how many Rnd are less than this Ori 
            }
            //} else 
            //{
            //  for (unsigned int j = genesPerSet; j < totalGenes; j++)  // i in 1 set=Ori, j in 2 set=Rnd, count how many Rnd are less than this Ori 
            //  {   if (value.at(i) > value.at(j)) {    tempScore++;        }
            //  }

            //}
            //score.push_back(tempScore);
        }

    } else 
    if (statType == FDR_PValue || statType == PValue) 
    {
        // Lower value is a winner
        for (size_t i = 0; i < genesPerSet; i++)   
        {   
            for (size_t j = 0; j < genesPerSet; j++)   
            {   U_Random  +=  (valueRnd[i]  < valueOri[j]);// i en 2 set=Rnd, j in 1 set=Ori. count how many Ori are > than this Rnd 
                U_Original+=  (valueOri[i]  < valueRnd[j]);// i in 1 set=Ori, j in 2 set=Rnd, count how many Rnd are > than this Ori 
            }
            //} else 
            //{
            //  for (unsigned int j = genesPerSet; j < totalGenes; j++)  // i in 1 set=Ori, j in 2 set=Rnd, count how many Rnd are less than this Ori 
            //  {   if (value.at(i) > value.at(j)) {    tempScore++;        }
            //  }

            //}
            //score.push_back(tempScore);
        }

        //for (unsigned int i = 0; i < totalGenes; i++) 
        //{   tempScore = 0;
        //  if (!isOriginal[i]) 
        //  {   for (int j = 0; j < genesPerSet; j++) {
        //          if (value.at(i) < value.at(j)) {  // Rnd i < Ori j increm U_Random
        //              tempScore++;
        //          }
        //      }

        //  } else {
        //      for (unsigned int j = genesPerSet; j < totalGenes; j++) { // Ori i < Rnd j. Increm U_Original
        //          if (value.at(i) < value.at(j)) {
        //              tempScore++;
        //          }
        //      }

        //  }

        //  score.push_back(tempScore);
        //}

    } else { cout << endl << "ERROR. Statistic type not defined." << endl;  }


    U_Final = (U_Original < U_Random) ? U_Original : U_Random;

    /**
     * calculate z
     */
    double Zn, Zd, Z;
    Zn = U_Final - ((genesPerSet * genesPerSet) / 2);
    Zd = sqrt(
            (double) (((genesPerSet * genesPerSet
                    * (genesPerSet + genesPerSet + 1)))) / 12.0);
    Z = Zn / Zd;

    /**
     * Return 0/1/2
     * 2: p value < 0.01
     * 1: 0.01 < p value < 0.05
     * 0: p value > 0.05
     */
    if (fabs(Z) > 2.303)
        return 2;
    else if (fabs(Z) > 1.605)
        return 1;
    else
        return 0;
}

Thanks for the detailed answer. I just saw it now. I will try it and reply back. Really appreciate the effort. thanks — Pranjal, Apr 01 '13 at 23:41

How to make this code faster

1 Answers1