1

I have implemented this code from this to vectorizing it:

   int c=0;
   for (int j=-halfHeight; j<=halfHeight; ++j)
   {
       #pragma omp simd
       for(int i=-halfWidth; i<=halfWidth; ++i){
           wx_[c] = ofsx + j * a12 + i * a11;
           wy_[c] = ofsy + j * a22 + i * a21;
           x_[c] = (int) floor(wx_[c]);
           y_[c] = (int) floor(wy_[c]);
           ++c;
       }
   }

   std::cout<<"First size="<<size<<std::endl;

   float imat_1[size];
   std::cout<<"imat1"<<std::endl;
   float imat_2[size];
   std::cout<<"imat2"<<std::endl;
   float imat_3[size];
   std::cout<<"imat3"<<std::endl;
   float imat_4[size];`
   std::cout<<"imat4"<<std::endl;

   #pragma omp simd
   for(int c=0; c<size; c++){
       if (x_[c] >= 0 && y_[c] >= 0 && x_[c] < width && y_[c] < height){
           wx_[c] -= x_[c];
           wy_[c] -= y_[c];
           imat_1[c] = im.at<float>(y_[c],x_[c]);
           imat_2[c] = im.at<float>(y_[c],x_[c]+1);
           imat_3[c] = im.at<float>(y_[c]+1,x_[c]);
           imat_4[c] = im.at<float>(y_[c]+1,x_[c]+1);
       }
       else{
           wx_[c] = 0;
           wy_[c] = 0;
           imat_1[c] = 0;
           imat_2[c] = 0;
           imat_3[c] = 0;
           imat_4[c] = 0;
           ret = true;
       }
   }

   std::cout<<"Second"<<std::endl;

   #pragma omp simd
   for(int c=0; c<size; c++){
       out[c] =
               (1.0f - wy_[c]) * ((1.0f - wx_[c]) * imat_1[c]   + wx_[c] * imat_2[c]) +
               (       wy_[c]) * ((1.0f - wx_[c]) * imat_3[c] + wx_[c] * imat_4[c]);
   }

In particular, size can reach up to 275625. When I run this code, it goes in segmentation fault at line float imat_4[size];. In fact, this is solved by using float *imat_4 = (float*)malloc(sizeof(float)*size);

I think that this is because of this, so we run out of memory on the stack... But then, how can I solve this? I don't see much other possibilities for vectorizing this code.

notice that performance are crucial here, so allocating on the stack is less efficient (right?)

Community
  • 1
  • 1
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
  • Your compiler might have a flag to increase the maximum stack size. – user253751 Apr 12 '17 at 00:17
  • 2
    The short answer to the question: you don't. This is what `std::vector` is for. – Sam Varshavchik Apr 12 '17 at 00:18
  • 2
    You *don't!* The stack is limited in size. If you need large "arrays" of data you allocate on the heap (for C) or use [`std::vector`](http://en.cppreference.com/w/cpp/container/vector) (for C++). – Some programmer dude Apr 12 '17 at 00:18
  • 1
    Also remember that it doesn't really matter *where* you store the "array". All you need is a pointer to the first element, and the number of elements, and you can treat a pointer to heap-allocated data just the same as a stack-allocated array. In fact, for any array *or pointer* `a` and index `i`, the expressions `a[i]` is exactly equal to `*(a + i)`. I.e. no matter what you do the generated code will treat arrays and pointer the same: As pointers. – Some programmer dude Apr 12 '17 at 00:20
  • I updated my question, stating how performance are crucial here. If I'm not wrong, allocating on the stack is less efficient. – justHelloWorld Apr 12 '17 at 00:22
  • If you dynamically allocate a large array once in your program and pass pointers or references to it, you should not notice any performance hit. Allocating large arrays multiple times or copying them eats up execution time. – Thomas Matthews Apr 12 '17 at 00:23
  • 2
    @justHelloWorld The location doesn't matter. Arrays are treated the same no matter where they are allocated and placed. There's no "efficiency" between an array on the stack or on the heap. – Some programmer dude Apr 12 '17 at 00:23
  • 1
    Just to extend what @Someprogrammerdude is saying, there may be some gains in ensuring your dynamic memory has certain _alignment_ properties. You can indicate this to the compiler when you allocate on the stack (using `std::alignas`). Currently, you cannot specify an alignment when you use `new`. And so custom allocation routines are required. Times when you might need this are, say, you have a large array of `float` values that you intend to vectorize with SIMD instructions. And on that note, I'd suggest that your algorithm is more of a bottleneck than your memory allocation strategy. – paddy Apr 12 '17 at 00:42
  • @paddy that was what I was thinking about – justHelloWorld Apr 12 '17 at 07:03

0 Answers0