Problem when calling template CUDA kernel

Question

I've been trying to create template kernels but I'm been having some trouble calling them in my program. I have a Matrix<T> template class, and some methods defined inside it

Matrix.h:

template <typename T> class Matrix {
    ...
    void sum(Matrix<T>& m1, Matrix<T>& m2, Matrix<T>& sum);
    ...
}

#include "Matrix.cu"

Matrix.cu:

#include "MatrixKernel.h"

template<typename T> void Matrix<T>::sum(const Matrix<T>& m, Matrix<T>& sum) {
    ...
    sumKernel<T><<<dimGrid, dimBlock>>>(Matrix<T> m1, Matrix<T> m2, Matrix<T> sum)
    ...
}

MatrixKernel.h:

template<typename T> __global__ void sumKernel(const Matrix<T> m1, const Matrix<T> m2, Matrix<T> sum) {
...
}

The problem is that when I call sumKernel from inside of sum, the compiler gives me the following error:

error C2059: syntax error : '<'

Does somebody know what's going on? The code compiled fine just before I included the sumKernel call.

Thanks.

I didn't know you could use CUDA and C++ (!). Trivial suggestion: try putting a space between the `` and the `<<<` in case running them together is causing parse problems. — Rup, Jun 14 '11 at 10:43
Did the compiler tell you which line the error is on? There's a lot of — Matt Bond, Jun 14 '11 at 10:45
@Rup: yes, you can. You can even pass objects as arguments to kernels (provided you copied the data of interest to the device memory). I'll try your suggestion too. @Bomadeno: The error is on the line that performs the kernel call. — Renan, Jun 14 '11 at 15:42
Thanks - sounds like I'm just out-of-date. Wikipedia says the C++ support is new in CUDA 2.x. — Rup, Jun 14 '11 at 15:46

Matt Bond · Accepted Answer · 2011-06-15T12:46:10.633

So, it seems you do have a strange #include, leading to code getting compiled by the wrong compiler. Make a distinction between gpu headers and cpu headers by using .cu.h for cuda headers. Make sure only NVCC compiles .cu and .cu.h files. Cuda files should never be included in cpp files. The kernel and kernel call should be in a .cu or .cu.h files, and those files shouldn't be included anywhere in cpps.

Because your .cu is being included in a header which is being compiled by the host compiler, the host compiler ends up hitting the token <<< - which it doesn't recognise. It probably does understand the token << so it consumes that, leaving an unexpected <.

Here's an alternative way of doing things that should work (not tried it but it's similar to code we use)

(note, this might work but it also might not be the right way to solve the problem. My boss doesn't like it as a solution and would prefer to add an implementation per variation)

The underlying problem seems to be lack of distinction between host and device code. I'm leaving the detail out in my solution - things like copying results to and from the device, sum implementation, etc.

The problem I'm trying to solve is, given a construct, how can you template it for use both on the host and the device?

I'll template Matrix.h on both the type and the implementation detail.

 template <typename T, typename Implementation<T> > class Matrix {
     void sum(Matrix<T>& m1, Matrix<T>& m2, Matrix<T>& sum)
     {
         Implementation.sumImp(m1, m2, sum);
     }
 }

The host implementation, HostMatrixSum.h will do things the on the cpu:

 #include "Matrix.h"

 template <typename T> struct HostMatrixSum
 {
     void sumImp(Matrix<T>& m1, Matrix<T>& m2, Matrix<T>& sum)
     {
         ...
     }
 }

While GpuMatrixSum.cu.h will upload the matrix, do the sum and recover the results:

 #include "Matrix.h"

 template <typename T> struct GpuMatrixSum
 {   
     template<typename T> __global__ void sumKernel(const Matrix<T> m1, const Matrix<T> m2, Matrix<T> sum)
     {
         ...
     }

     void sumImp(Matrix<T>& m1, Matrix<T>& m2, Matrix<T>& sum)
     {
         ...
         sumKernel<T> <<< dimGrid, dimBlock >>> (m1,m2);
         ...
     }
 }

Then when we come to use Matrix from host code we template on the host sum implementation and never need to see any cuda specifics:

 #include "Matrix.h"
 #include "HostMatrixSum.h"

 Matrix<int, HostMatrixSum> m1 = Matrix<int>(...);
 Matrix<int, HostMatrixSum> m2 = Matrix<int>(...);
 Matrix<int, HostMatrixSum> result;
 Matrix.sum(m1,m2,result);

And if we're working on the gpu we can use the accelerated gpu implementation of sum:

 #include "Matrix.h"
 #include "GpuMatrixSum.cu.h"

 Matrix<int, GpuMatrixSum> m1 = Matrix<int>(...);
 Matrix<int, GpuMatrixSum> m2 = Matrix<int>(...);
 Matrix<int, GpuMatrixSum> result;
 Matrix.sum(m1,m2,result);

Hope that works for you!

I'm going to try that. But at the same time it seems strange that in MatrixKernel.h, for instance, the compiler doesn't complain about the __global__ keyword (which can only mean NVCC is compiling it, right?) Another thing: if what you say actually is the problem, where would I implement the sum method? If I don't write '#include "Matrix.cu"' inside "Matrix.h", there will be a link error, for the template must be declared and defined at the same file... — Renan, Jun 14 '11 at 15:53
I think you were right, I had to forget about implementing Matrix as a template class, cause there was no way I could make it work that way. If I include for instance a .cu file on the header, every other file that includes the Matrix header will also include the .cu, even .cpp ones, which will inevitably give rise to a compilation error. Using template kernels is ok, but making the c++ methods that call them templates too is not possible because of what I just explained. It's a little winding after all... — Renan, Jun 15 '11 at 06:53
Because MatrixKernel.h is only included by a cu file, only nvcc ever includes it. If you included MatrixKernel.h in a host cpp file, I suspect it would fall over. I name cuda specific headers .cu.h to clarify that it's only for inclusion from .cu files. I'm working on an answer to the 'how to do it' part of the question - trying to work out an elegant solution :) — Matt Bond, Jun 15 '11 at 10:16

score 1 · Answer 2 · edited May 23 '17 at 12:28

I had the same problem: error C2059: syntax error : '<'

First, I found a good set up/tutorial here (for visual express 2010 and cuda 4.0): http://www.stevenmarkford.com/installing-nvidia-cuda-with-visual-studio-2010/

and to solve the syntax error problem, this solved it: How do I start a CUDA app in Visual Studio 2010?

Specifically, changing the property of the *.cu file such that: Type is set to "CUDA C/C++"

Finally worked for me.

Problem when calling template CUDA kernel

2 Answers2