I'm pretty new to the whole OpenCL world, and I have created two pretty simple Kernels and I am trying to chain them togehter, but I am getting rather spurious results. When ran individually, they work as expected, but when slapped together, that is when I am seeing the strange results.
So, Each Kernel individually look like this
Vector 3 Noise
__kernel void addVector3Noise( __global struct State* states, __global float3* randomVector3Values){
int stateNum = get_global_id(0);
struct State state = states[stateNum];
float3 randomVal = randomVector3Values[stateNum];
struct State newState;
newState.Vec3 = (float3)(state.Vec3.x + randomVal.x,state.Vec3.y + randomVal.y,state.Vec3.z + randomVal.z);
newState.Vec4 = state.Vec4;
states[stateNum] = newState;
}
For testing this, all of the states had Vec3
of [ 1.0f, 1.0f, 1.0f]
with all of the random values being the same, so the output I get is an array of states with the values [2.0f, 2.0f, 2.0f]
as I would Expect.
Vector 4 Noise
__kernel void addVector4Noise(__global struct State* states,
__global float3* randomVector4Values){
int stateNum = get_global_id(0);
struct State state = states[stateNum];
float3 randomVal = randomVector4Values[stateNum];
float4 newVector4 = randomQuaternionRotation(state.Vector4, randomVal);
struct State newState;
newState.Vector3 = state.Vector3;
newState.Vector4= newVector4;
states[stateNum] = newState;
}
Running this with very simple test data also gives me what I want.
Now the issue comes in when chaining them together. I invoked them in the order Vector 4 noise -> Vector 3 noise. Now when running the Vector 4 noise Kernel, I am seeing the vector 3 values change, and the change seems to follow a pattern.
So, after the vector 4 kernel has been ran, I would expect the vector 3 in each state to be the same as when it was plugged in. So that would mean every state would have a Vector 3 value of [1.0f,1.0f,1.0f]
The following is what I am actually seeing the vector 3's come out as:
[1.0,1.0,1.0]
[0.576367259,1.0,1.0]
[0.999199867,0.6448302,1.0]
[1.313311, 1.067663, 0.3307195]
[-0.08005857, 1.067663, 1.450237]
[1, 0.2340522, 1.136126]
[1, 1, 0.3025152]
[1, 1, 1]
And that pattern repeats itself throughout all of the Vector 3 Values. Note that in the Kernel, it is just copying the Vector3 from the previous state into the new state.
This is how I have them chained together using OpenCL.Net
using (var env = "*".CreateCLEnvironment(DeviceType.Gpu))
{
var source = LoadProgram("kernels.cl");
var context = env.Context;
ErrorCode errorCode;
var program = Cl.CreateProgramWithSource(context, 1u, source, null, out errorCode);
CheckSuccess(errorCode);
errorCode = Cl.BuildProgram(program, (uint)env.Devices.Length, env.Devices, "-cl-opt-disable", null,
IntPtr.Zero);
if (errorCode != ErrorCode.Success)
{
var info = Cl.GetProgramBuildInfo(program, env.Devices[0], ProgramBuildInfo.Log, out errorCode).ToString();
throw new Exception(info);
}
var kernels = Cl.CreateKernelsInProgram(program, out errorCode);
CheckSuccess(errorCode);
var Vector4NoiseKernel = kernels[0];
var Vector3NoiseKernel = kernels[1];
var rnd = new Random();
var states = Enumerable.Range(1, ArrayLength)
.Select(_ => new State
{
Vector3 = new Vector3(1, 1, 1),
Vector4 = new Vector4(0.5f,0.5f,0.5f,0.5f)
})
.ToArray();
var randomVector4Values = Enumerable.Range(1, ArrayLength)
.Select(_ => new Vector3(2f, 2f, 2f))
.ToArray();
var randomVector3Values = Enumerable.Range(1, ArrayLength)
.Select(_ => new Vector3(1f, 1f, 1f))
.ToArray();
var vector4StatesBuffer = context.CreateBuffer(states, MemFlags.ReadWrite);
var randomVector4ValuesBuffer = context.CreateBuffer(randomVector4Values, MemFlags.ReadOnly);
Event ev;
Cl.SetKernelArg(vector4NoiseKernel, 0, vector4StatesBuffer);
Cl.SetKernelArg(vector4NoiseKernel, 1, randomVector4ValuesBuffer);
errorCode = Cl.EnqueueNDRangeKernel(env.CommandQueues[0], vector4NoiseKernel, 1, null
, new[] { new IntPtr(ArrayLength) }, new[] { new IntPtr(1) }, 0u, null, out ev);
errorCode.Check();
env.CommandQueues[0].ReadFromBuffer(vector4StatesBuffer, states, waitFor: ev);
var randomVector3ValuesBuffer = context.CreateBuffer(randomVector3Values, MemFlags.ReadOnly);
var vector3StatesBuffer = context.CreateBuffer(states, MemFlags.ReadWrite);
Cl.SetKernelArg(vector3NoiseKernel, 0, vector3StatesBuffer);
Cl.SetKernelArg(vector3NoiseKernel, 1, randomVector3ValuesBuffer);
errorCode = Cl.EnqueueNDRangeKernel(env.CommandQueues[0], vector3NoiseKernel, 1, null
, new[] { new IntPtr(ArrayLength) }, new[] { new IntPtr(1) }, 0u, null, out ev);
errorCode.Check();
Cl.Finish(env.CommandQueues[0]).Check();
env.CommandQueues[0].ReadFromBuffer(vector3StatesBuffer, states, waitFor: ev);
}
Excuse the huge dollop of code there, but this is a playground project, and I am pretty much just vomiting ideas out, so tidyness and elegance are not an issue here :)
Thanks in advance for any help you may be able to provide.
EDIT
So the first thing I have done this morning is pull each kernel out into its own cl file and make sure each has its own version of state with only what is required for it (Vector4 and Vector3 respectively), along with a new using statement with all the gubbins that goes along with it for the newly separated out Vector3 noise kernel. To my joy, the Vector4 noise kernel did exactly as I expected it to do, however, when it came to the Vector3 noise, a similar issue to previously occured. Still passing in [1.0f,1.0f,1.0f]
as both the random values and the starting Vector3 values, and it is still not producing the output I expect. The pattern that repeats this time is:
[2.0f,2.0f,2.0f]
[1.0f,2.0f,2.0f]
[2.0f,1.0f,2.0f]
[2.0f,2.0f,1.0f]
[2.0f,2.0f,2.0f]