0

I am new to this, I have a convolution program that takes in a file of data, convolves it and outputs another file. I am attaching the code here.

void convolute()
{
    ifstream fin;
    ofstream fout;

    int count = 0;
    double a=0,b=0;

    string input_file_string = "maxclus_500000node_3M_5000ens_666.dat";

    string output_file_string = "1convolute_"+input_file_string;

    fin.open(input_file_string.c_str());

    while(fin) //to know the size of array to initialize
    {
        fin>>a>>b;
        count++;

    }

    fin.close();
    double* c = NULL;
    c = new double[count+1];
    double* d = NULL;
    d = new double[count+1];

    for(int i=0;i<count+1;i++)
    {
        c[i] = 0;
        d[i] = 0;
    }

    fin.open(input_file_string.c_str());

    int n = 1;

    while(fin) //takes in data
    {
        fin>>a>>b;
        c[n] = a;
        d[n] = b;
        n++;
    }

    fin.close();

    double* binom = NULL;
    binom = new double[count];

    double* summ = NULL;
    summ = new double[count+1];

    for(int i=0;i<count+1;i++) summ[i] = 0;

    for(int i=0;i<count;i++) binom[i] = 0;

    for(int j=1;j<count;++j) //main convolution of data takes place
    {
        int x,y;
        double prob = j*1.0/(count-1);
        binom[j] = 1;

        for(int i=j+1;i<count;++i)
            binom[i] = binom[i-1]*((count-1)-i+1)*1.0/i*prob/(1-prob);

        for(int i=j-1;i>=0;--i)
            binom[i] = binom[i+1]*(i+1)*1.0/((count-1)-i)*(1-prob)/prob;

        double sum = 0;

        for(int i=0;i<count;++i) sum += binom[i];
        for(int i=0;i<count;++i) binom[i] /= sum;

        sum = 0;

        for(int i=1;i<count;++i) sum += d[i]*binom[i];

        summ[j] = sum;

        //fout<<c[j]<<'\t'<<sum<<endl;
        if(j%1000==0)
            cout<<count-1<<'\t'<<j<<endl;

    }

    cout<<"writing to file "<<endl;

    fout.open(output_file_string.c_str());

    for(int i=1;i<count;i++) fout<<c[i]<<'\t'<<summ[i]<<endl;

        fout.close();

    delete [] c;
    c = NULL;
    delete [] d;
    d = NULL;

    delete [] binom;
    binom = NULL;

    delete [] summ;
    summ = NULL;

}

I want to know, what i can do to speed up the part where main convolution takes place, my data files are quite large, and it takes a great amount of time for it to finish. Need help.

Antwane
  • 20,760
  • 7
  • 51
  • 84
  • 6
    Improving working code should be asked about on [CodeReview.SE](https://codereview.stackexchange.com/) – NathanOliver Jun 30 '17 at 13:37
  • One thing that you could potentially do is unroll some of the loops you have there. For instance `for(int i=0;i – Marius Bancila Jun 30 '17 at 13:43
  • 1
    @MariusBancila Shouldn't you leave loop unrolling to the compiler? I doubt that this will make the program significantly faster... – Simon Kraemer Jun 30 '17 at 13:45
  • i am using -funroll-loops optimization flag with the compiler, still isnt fast enough, isn't it the same? – Digonto Islam Jun 30 '17 at 13:48
  • @DigontoIslam Pretty much. Also if `count` isn't a multiple of 5 you will need additional checks so that you don't run into undefined behaviour. Unrolling loops by hand isn't a good idea most of the times. – Simon Kraemer Jun 30 '17 at 13:52
  • 1
    How much of the time is computational, and how much is IO? Time it as-is, compare to time with dummy writes. (The loop could be in its own function, which would help the profiler tell you what's taking the time.) I don't doubt that the O(n**2) loop is a major factor, but IO is often underestimated, too. – Kenny Ostrom Jun 30 '17 at 13:53
  • I can see a lot of initialization loops for arrays of size `count`. You could move these into a single loop... BTW: I would recommend you use `std::vector` instead of C style arrays and remove the initialization loops completely. See also https://stackoverflow.com/a/381656/4181011 – Simon Kraemer Jun 30 '17 at 14:10
  • @KennyOstrom the IO takes about 3.1seconds, the main problematic and computation-hard part is the one where the j loop occurs, i have given a comment there, please do check out and give me any advice to modify it so that it runs faster, thank you – Digonto Islam Jun 30 '17 at 14:11
  • 2
    I have a couple of ideas, but not sure if post them here as the others suggested to move this thread to CodeReview. Are you going to do that? – KjMag Jun 30 '17 at 14:15

2 Answers2

0

Due to the nested for loops, just the part of your code that does the convolution has a run time of T(n) = 5n2 + .... This is obviously rather inefficient for files where n is large.

To solve this, you need to cut down on the number of for loops that have only one line of code. If you traverse the entire array, do as many operations as you can in one pass. To optimize this, you would need to restructure the way you perform the logic so that you have as few for loops as possible.

Are you required to use arrays? Vectors would serve the same (and more efficient) purpose as you can add elements as you read in the file (rather than having to open the file to get the count...). Thus in turn,

vector.size();

gives you the size of the vector. Also, using an entire loop to initialize the array to 0 is only necessary if you are going to increment an element. If you are simply assigning the element, initializing the array to 0 serves no purpose.

If all of this is within one function, then you really need to split it up. This function in its entirety is bad style as there are many unnecessary operations, large amounts of for loops, the function is far too long and does far too many things in itself that is outside the scope of convoluting an input.

#include <vector>
#include <fstream>

void convolute(/*data you need*/);

int main() {
  string input_file_string = "maxclus_500000node_3M_5000ens_666.dat";
  string output_file_string = "1convolute_"+input_file_string;

  ifstream fin (input_file_string);
  ofstream fout (output_file_string);

  int count = 0;
  double a=0,b=0;
  vector<double> c;
  vector<double> d;

  while (fin >> a >> b) {
    c.push_back(a);
    d.push_back(d);
  }
  fin.close();

  vector<double> binom;
  vector<double> summ;

  convolute(/*Pass the Data you need*/);
  ...
}

See how much more simple this is? Using the right data types simplifies your life greatly. Also, for the sake of readability, put the operations inside a for loop on the next line. This makes it more obvious that there is something going on there, and you are still not required to use brackets (as long as the code is only one line long).

rsdev
  • 140
  • 2
  • 4
0

I tried to remove some of you loops. Feel free to ask if my comments aren't sufficient.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <tuple>
#include <algorithm>

using std::cout;
using std::endl;

using std::ifstream;
using std::ofstream;

using std::string;
using std::vector;
using std::pair;

void convolute()
{

    string input_file_string = "maxclus_500000node_3M_5000ens_666.dat";
    string output_file_string = "1convolute_" + input_file_string;


    double a = 0, b = 0;
    vector<pair<double, double>> coordinates;

    ifstream fin;
    fin.open(input_file_string.c_str());
    while (fin)
    {
        fin >> a >> b;
        coordinates.push_back({ a,b });
    }
    fin.close();

    static int count = (int)coordinates.size();
    vector<double> binom(coordinates.size(), 0);
    vector<double> summ(coordinates.size(), 0);

    const double probMul = 1.0 / (coordinates.size() - 1); //Probability multiplicator

    for (int j = 1; j< coordinates.size(); ++j) //main convolution of data takes place
    {
        const double prob = j*probMul;

        binom[j] = 1;

        double sum = binom[j];
        double sum_x = binom[j] * coordinates[j].second; // <- (A/X) + (B/X) +... <=> (A+B+C+...)/X
        for (int i = j + 1; i < count; ++i) //from j+1 to end
        {
            binom[i] = binom[i - 1] * ((count - 1) - i + 1)*1.0 / i*prob / (1 - prob);
            sum += binom[i];
            sum_x += binom[i] * coordinates[i].second;
        }
        for (int i = j - 1; i >= 0; --i)  //from j-1 to front
        {
            binom[i] = binom[i + 1] * (i + 1)*1.0 / ((count - 1) - i)*(1 - prob) / prob;
            sum += binom[i];
            sum_x += binom[i] * coordinates[i].second;
        }

        summ[j] = sum_x / sum;

        if (j % 1000 == 0)
        {
            cout << count - 1 << '\t' << j << endl;
        }
    }


    cout << "writing to file " << endl;

    ofstream fout;
    fout.open(output_file_string.c_str());
    for (int i = 1; i < count; i++)
    {
        fout << coordinates[i].first << '\t' << summ[i] << endl;
    }
    fout.close();
}

I couldn't test it - so that's up to you.

Simon Kraemer
  • 5,700
  • 1
  • 19
  • 49