0

Can someone please guide me on how can I perform matrix multiplication in C# to use the GPU using opencl.

I have looked at opencl example here: https://www.codeproject.com/Articles/1116907/How-to-Use-Your-GPU-in-NET

But I am not sure how to proceed for matrix multiplication.

  • you dont know how to use GPU or how to multiply matrix or both? – Frenchy Dec 07 '18 at 11:36
  • I am trying to perform matrix multiplication using opencl. The examples I have seen are for 1 dimensional array but I have a 2D array. Do I need to convert this 2D array to a 1D array first ? If yes, then how does matrix multiplication work for this transformed 1D array ? – Abhilash Hazarika Dec 07 '18 at 12:16
  • Yes, you need to flatten the array into 1D, pass that to GPU and then in the OpenCL kernel compute right index when performing matrix multiplication. – doqtor Dec 07 '18 at 17:37
  • How do I perform the matrix multiplication in the GPU ? I can do that on the cpu by running loops, but I do not know what to do when running on GPU. – Abhilash Hazarika Dec 08 '18 at 09:44

2 Answers2

0

yes as say doqtor, you need to flatten into 1D. So i have an example to use more args :

class Program
{
    static string CalculateKernel
    {
        get
        {
            return @"
            kernel void Calc(global int* m1, global int* m2, int size) 
            {
                for(int i = 0; i < size; i++)
                {
                    printf("" %d / %d\n"",m1[i],m2[i] );
                }
            }";
        }
    }

static void Main(string[] args)
    {

        int[] r1 = new int[]
            {1, 2, 3, 4};

        int[] r2 = new int[]
            {4, 3, 2, 1};

        int rowSize = r1.Length;

        // pick first platform
        ComputePlatform platform = ComputePlatform.Platforms[0];
        // create context with all gpu devices
        ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu,
            new ComputeContextPropertyList(platform), null, IntPtr.Zero);

        // create a command queue with first gpu found
        ComputeCommandQueue queue = new ComputeCommandQueue(context,
            context.Devices[0], ComputeCommandQueueFlags.None);

        // load opencl source and
        // create program with opencl source
        ComputeProgram program = new ComputeProgram(context, CalculateKernel);

        // compile opencl source
        program.Build(null, null, null, IntPtr.Zero);

        // load chosen kernel from program
        ComputeKernel kernel = program.CreateKernel("Calc");

        // allocate a memory buffer with the message (the int array)
        ComputeBuffer<int> row1Buffer = new ComputeBuffer<int>(context,
            ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r1);

        // allocate a memory buffer with the message (the int array)
        ComputeBuffer<int> row2Buffer = new ComputeBuffer<int>(context,
            ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r2);


        kernel.SetMemoryArgument(0, row1Buffer); // set the integer array
        kernel.SetMemoryArgument(1, row2Buffer); // set the integer array
        kernel.SetValueArgument(2, rowSize); // set the array size

            // execute kernel
        queue.ExecuteTask(kernel, null);

        // wait for completion
        queue.Finish();

        Console.WriteLine("Finished");
        Console.ReadKey();
    }

another sample with the reading of result from gpubuffer:

class Program
{
    static string CalculateKernel
    {
        get
        {
            // you could put your matrix algorithm here an take the result in array m3
            return @"
            kernel void Calc(global int* m1, global int* m2, int size, global int* m3) 
            {
                for(int i = 0; i < size; i++)
                {
                    int val = m2[i];
                    printf("" %d / %d\n"",m1[i],m2[i] );
                    m3[i] = val * 4;
                }
            }";
        }
    }

static void Main(string[] args)
    {

        int[] r1 = new int[]
            {8, 2, 3, 4};

        int[] r2 = new int[]
            {4, 3, 2, 5};

        int[] r3 = new int[4];
        int rowSize = r1.Length;

        // pick first platform
        ComputePlatform platform = ComputePlatform.Platforms[0];
        // create context with all gpu devices
        ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu,
            new ComputeContextPropertyList(platform), null, IntPtr.Zero);

        // create a command queue with first gpu found
        ComputeCommandQueue queue = new ComputeCommandQueue(context,
            context.Devices[0], ComputeCommandQueueFlags.None);

        // load opencl source and
        // create program with opencl source
        ComputeProgram program = new ComputeProgram(context, CalculateKernel);

        // compile opencl source
        program.Build(null, null, null, IntPtr.Zero);

        // load chosen kernel from program
        ComputeKernel kernel = program.CreateKernel("Calc");

        // allocate a memory buffer with the message (the int array)
        ComputeBuffer<int> row1Buffer = new ComputeBuffer<int>(context,
            ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r1);

        // allocate a memory buffer with the message (the int array)
        ComputeBuffer<int> row2Buffer = new ComputeBuffer<int>(context,
            ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, r2);

        // allocate a memory buffer with the message (the int array)
        ComputeBuffer<int> resultBuffer = new ComputeBuffer<int>(context,
            ComputeMemoryFlags.ReadOnly | ComputeMemoryFlags.UseHostPointer, new int[4]);


        kernel.SetMemoryArgument(0, row1Buffer); // set the integer array
        kernel.SetMemoryArgument(1, row2Buffer); // set the integer array
        kernel.SetValueArgument(2, rowSize); // set the array size
        kernel.SetMemoryArgument(3, resultBuffer); // set the integer array
        // execute kernel
        queue.ExecuteTask(kernel, null);

        // wait for completion
        queue.Finish();

        GCHandle arrCHandle = GCHandle.Alloc(r3, GCHandleType.Pinned);
        queue.Read<int>(resultBuffer, true, 0, r3.Length, arrCHandle.AddrOfPinnedObject(), null);

        Console.WriteLine("display result from gpu buffer:");
        for (int i = 0; i<r3.Length;i++)
            Console.WriteLine(r3[i]);

        arrCHandle.Free();
        row1Buffer.Dispose();
        row2Buffer.Dispose();
        kernel.Dispose();
        program.Dispose();
        queue.Dispose();
        context.Dispose();

        Console.WriteLine("Finished");
        Console.ReadKey();
    }
}

you just adapt the kernel program to calculate the multiplication of 2 matrix

result of last program:

 8 / 4
 2 / 3
 3 / 2
 4 / 5
display result from gpu buffer:
16
12
8
20
Finished

to flatten 2d to 1d its really easy take this sample:

        int[,] twoD = { { 1, 2,3 }, { 3, 4,5 } };
        int[] oneD = twoD.Cast<int>().ToArray();

and see this link to do 1D -> 2D

Frenchy
  • 16,386
  • 3
  • 16
  • 39
  • Thank you for the sample code, but the code does not take any advantage of parallel processing. – Abhilash Hazarika Dec 10 '18 at 11:41
  • yes and no you take advantage of the fact the memory is fastest inside GPU and you win lapse time. Your question doesnt ask about parallel programmation, just how to dialog with GPU. if you want to use parallel programmation, you have to cut the calcul of multplicaiton of matrix in different task, and each task does an action. All job cant be cut..so i think for matrix multiplication the job is done see this link -> https://stackoverflow.com/questions/27129420/matrix-multiplication-with-threads just launch each task inside GPU – Frenchy Dec 10 '18 at 12:21
0

I found a very good reference source for using OpenCL with dot Net.

This site is well structured and very useful. It also has matrix multiplication case study example.

OpenCL Tutorial