Vectorising my scalar function

Question

Let's say I want to compute a raised cosine. I could have a macro that does #define cos_raised(x) (0.5f + 0.5f * cos(x)), but for the sake of my problem I want to make it a function, like this:

float cos_raised(float x)
{
    return 0.5f + 0.5f * cos(x);
}

This works fine but only with a single float input, when it could be easily vectorised. How do I properly vectorise it and make it accept float2/3/4/8/16 as input and output without duplicating the body of the function (this is a trivial example but I need to know this for much more complex functions)?

Edit: I guess I'm asking how to make a gentype function? Just typing gentype doesn't work though.

Will there be a single version of `cos_raised` in your kernel, or multiple versions, i.e. simultaneously having a `float2` and a `float4` version? If there's always going to be one version only, defining a few macros when building the kernel works. — chippies, Jan 10 '16 at 16:47
What macros? The idea is to have a single version but have it work with those different types. — Michel Rouzic, Jan 10 '16 at 21:00
What I meant was, lets say your kernel only uses on version of `float cos_raised(floatN x)` but you don't know what `floatN` will be until your program actually runs. This scenario is applicable where you're vectorising all your functions to work with the same vector width, but the vector width only gets defined at runtime. If this is the case, you could use the `options` parameter of `clBuildProgram` and pass something like `"-D floatN=float2"`. This way, you have a single version of the function in your code, and it will work with which ever type is put in for `floatN`. Is this sufficient? — chippies, Jan 10 '16 at 21:38
Oh I see. That's an interesting idea, but not really general enough as I might want to mix it up. — Michel Rouzic, Jan 11 '16 at 01:02

score 0 · Answer 1 · edited May 23 '17 at 12:15

IIRC: Sadly, "gentype" is a concept that only exists in the OpenCL documentation, it's not actually a language feature which allows you to create generic/templatelike functions yourself. That means there is no easy way to do what you want, and you'll probably have to work some preprocessor magic to minimize code-duplication. See e.g. this SO thread: How to use C++ templates in OpenCL kernels? which offers more knowledge than I could.

score 0 · Answer 2 · answered Jan 10 '16 at 18:53

0

If the compiler inlines all of your function calls before it does vectorization then you're all set. BTW, vectorization is likely only needed for CPU execution since most GPUs are scalar now.

answered Jan 10 '16 at 18:53

Dithermaster

6,223
1
12
20

How? As it is just feeding cos_raised() with a float4 fails to compile. – Michel Rouzic Jan 10 '16 at 21:01
Sorry, I'm not understanding your question. Please re-state it. – Dithermaster Jan 11 '16 at 14:31
How am I all set? What would I have to do to make it work like this? Because as it is it wouldn't compile. – Michel Rouzic Jan 12 '16 at 17:51
Sorry, i misread. If you want versions of your function that take vector types you'll need to write each one. – Dithermaster Jan 13 '16 at 15:50

score 0 · Answer 3 · answered Jan 11 '16 at 19:55

Since OpenCL kernels get compiled at runtime, you can add extra lines of code at the start of the kernel. I'd put this to use by having a template function (e.g. stored in a separate text file) like this:

float{N:} cos_raised_float{N:}(float{N:} x)
{
    return 0.5f + 0.5f * cos(x);
}

Since I'm more familiar with Python, I've used it's syntax for specifying placeholders in a string, i.e. the {N:}. You'll have to find something similar for you host-code's language. Thereafter, just have a loop going through 2, 3, 4, 8 and 16, each time filling in {N:}. This gives you five extra strings that you need to add to the start of your kernel code. The down side is that you'll have to come up with some way to indicate in the main kernel code where all these generated functions should get inserted. They need to appear after all the #pragma XXX: enable statements. Afterwards, your kernel can call any version, e.g. cos_raised_float4.

Vectorising my scalar function

3 Answers3