0

I am getting "call to cuMemcpyDtoHsync returned error 700: Illegal address during kernel execution" error when I try to parallelize this simple loop.

#include <vector>
#include <iostream>
using namespace std;

int main() {
    vector<float> xF = {0, 1, 2, 3};

    #pragma acc parallel loop
    for (int i = 0; i < 4; ++i) {
        xF[i] = 0.0;
    }
    return 0;
}

Compiled with: $ pgc++ -fast -acc -std=c++11 -Minfo=accel -o test test.cpp

main:
  6, Accelerator kernel generated
     Generating Tesla code
      9, #pragma acc loop gang, vector(4) /* blockIdx.x threadIdx.x */
std::vector<float, std::allocator<float>>::operator [](unsigned long):
  1, include "vector"
      64, include "stl_vector.h"
          771, Generating implicit acc routine seq
               Generating acc routine seq
               Generating Tesla code
T3 std::__copy_move_a2<(bool)0, const float *,     decltype((std::allocator_traits<std::allocator<float>>::_S_pointer_helper<std::allocator<float>>((std::allocator<float>*)0)))>(T2, T2, T3):
  1, include "vector"
      64, include "stl_vector.h"
/usr/bin/ld: error in /tmp/pgc++cAUEgAXViQSY.o(.eh_frame); no .eh_frame_hdr table will be created.

$ ./test
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution

The code runs normally without the #pragma, but I would like to make it parallel. What am I doing wrong?

John Doe
  • 3
  • 2
  • You probably should not do that. As std::vector ins not guaranty to be thread-safe for parallel writing. https://stackoverflow.com/questions/9042571/ – Jerry Jun 25 '18 at 18:19
  • Hi, John. Have you figure it out? I'm getting the same problem – Coconut Jul 10 '19 at 02:56
  • @Jerry Do you mean std::vector does not work with openACC? – Coconut Jul 10 '19 at 02:56
  • @user3289737 I though is not thread-safe at all for parallelize write or concurrent write/read on std:vector at first place. The spec just does not ask std:vector to be implemented by vendor that way. Even it works in this implementation, it probably not works the same way on another. You should find another thread safe container for that instead if you wish to do that – Jerry Jul 13 '19 at 15:08

1 Answers1

0

Try compiling with "-ta=tesla:managed".

The problem here is that you aren't explicitly managing the data movement between the host and device and the compiler can't implicitly manage it for you since a std::vector is just a class of pointers so the compiler can't tell the size of the data. Hence, the device is using host addresses thus causing the illegal memory accesses.

While you can manage the data yourself by grabbing the vector's raw pointers and then using a data clause to copy the vector's data as well as copying the vector itself to the device, it's much easier to use CUDA Unified Memory (i.e. the "managed" flag) and have the CUDA runtime manage the data movement for you.

As Jerry notes, it's generally not recommended to use vectors in parallel code since they are not thread safe. In this case it's fine, but you may encounter other issues especially if you try to push or pop data. Better to use arrays. Plus arrays are easier to manage between the host and device copies.

% cat test.cpp
#include <vector>
#include <iostream>
using namespace std;

int main() {
    vector<float> xF = {0, 1, 2, 3};

    #pragma acc parallel loop
    for (int i = 0; i < 4; ++i) {
        xF[i] = 0.0;
    }
    for (int i = 0; i < 4; ++i) {
        std::cout << xF[i] << std::endl;
    }
    return 0;
}
% pgc++ -ta=tesla:cc70,managed -Minfo=accel test.cpp --c++11 ; a.out
main:
      6, Accelerator kernel generated
         Generating Tesla code
          9, #pragma acc loop gang, vector(4) /* blockIdx.x threadIdx.x */
      6, Generating implicit copy(xF)
std::vector<float, std::allocator<float>>::operator [](unsigned long):
      1, include "vector"
          64, include "stl_vector.h"
              771, Generating implicit acc routine seq
                   Generating acc routine seq
                   Generating Tesla code
0
0
0
0
Mat Colgrove
  • 5,441
  • 1
  • 10
  • 11