0

I've been trying to write code with CUDA that fills an array, writes the contents of the array to disk, refills the array with different values after the first write to disk is complete, etc. (I'm using this strategy to prevent global memory space from limiting my final file size.) My question is whether something like

for(n = 0; n < 100; n++) {
    kernel<<<blocks, threads>>> (dev_In, dev_Out);
    cudaMemcpy(host_Out, dev_Out, size*sizeof(float), cudaMemcpyDeviceToHost);
    fwrite(host_Out, sizeof(float), size, fp);
}

is safe. I assume that since there's only the default stream, the kernel executes, the memory is copied (synchronously), and finally host_Out is written to file. This is the part I'm not quite sure about. Will the kernel from for loop iteration (n+1) begin to execute while fwrite from for loop iteration (n) is still working? I assume not, but when I insert the line

printf("%d ", n);

just after fwrite(); and before the closing bracket of the for loop, nothing happens during the 3 minute execution of the program until the last second or so when the numbers 0 through 99 are all printed at once. This makes me wonder whether the printf() command, and perhaps the fwrite() command, are executed at the wrong times.

Thanks in advance for any advice!

0 Answers0