1

I was wondering how can I iterate over a loop with a any number of work items (per group is irrelevant)

I have 3 arrays and one of them is 2-dimensional(a matrix). The first array contains a set of integers. The matrix is filled with another set of (repeated and random) integers. The third one is only to store the results. I need to search for the farest pair's numbers of occurrences of a number, from the first array, in the matrix. To summarize:

  • A: Matrix with random numbers
  • num: Array with numbers to search in A
  • d: Array with maximum distances of pairs of each number from num

The algorithm is simple(as I don't need to optimize it), I only compare calculated Manhattan distances and keep the maximum value.

To keep it simple, it does the following (C-like pseudo code):

for(number in num){
  maxDistance = 0
  for(row in A){
    for(column in A){
      //calculateDistance is a function to another nested loop like this
      //it returns the max found distance if it is, and 0 otherwise
      currentDistance = calculateDistance(row, column, max)
      if(currentDistance > maxDistance){
        maxDistance = currentDistance
      }
    }
  }
}

As you can see there is no dependent data between iterations. I tried to assign each work item a slice of the matrix A, but still doesn't convince me.

IMPORTANT: The kernel must be executed with only one dimension for the problem.

Any ideas? How can I use the global id to make multiple search at once?

Edit:

I added the code to clear away any doubt. Here is the kernel:

__kernel void maxDistances(int N, __constant int *A, int n, __constant int *numbers, __global int *distances)
{
    
    //N is matrix row and col size
    //A the matrix
    //n the total count of numbers to be searched
    //numbers is the array containing the numbers
    //distances is the array containing the computed distances
    

    size_t id = get_global_id(0);   
    

    int slice = (N*N)/get_global_size(0);

    for(int idx_num = 0; idx_num < n; idx_num++)
    {
        int number = numbers[idx_num];
        int currentDistance = 0;
        int maxDistance = 0;


        for(int c = id*slice; c < (id+1)*slice; c++)
        {
            int i = c/N;
            int j = c%N;
            if(*CELL(A,N,i,j) == number){
                coord_t coords;
                coords.i = i;
                coords.j = j;

                //bestDistance is a function with 2 nested loop iterating over
                //rows and column to retrieve the farest pair of the number
                currentDistance = bestDistance(N,A,coords,number, maxDistance);
                if(currentDistance > maxDistance)
                {
                    maxDistance = currentDistance;
                }
            }
        }

        distances[idx_num] = maxDistance;
    }
}
Blackburn
  • 48
  • 4
  • "As you can see there is no dependent data between iterations" I do not see that.In your code every iterations uses `currentDistance` from the previous iteration. It can be rewritten to have independent iterations, but currently its not the case – 463035818_is_not_an_ai Jan 09 '23 at 16:34
  • "...but still doesn't convince me." why not? What would convince you? – 463035818_is_not_an_ai Jan 09 '23 at 16:34
  • It doesn't because somehow it seems to be incorrect as the matrix nested loops are going to be inside the outer loop, the one that iterates over the array of the numbers to search. Also, I don't know why, but it takes much longer. – Blackburn Jan 09 '23 at 16:52
  • In OpenCL globalid has different values on each thread so you can use it to index the parts of the matrix or array you want that thread to process. Are you fishing for free code here or have you actually tried to write some OpenCL? If you've tried to do it yourself, post what you have, preferably in a way others can just paste and run/test/fix/modify. – Simon Goater Jan 09 '23 at 18:45
  • I am not fishing for anything, i got the code, the same as you explained(global_id and global_size for matrix slicing). The matter is I dont know why it takes forever when a secuential version of the program does the job. I will edit the question and add the code. – Blackburn Jan 09 '23 at 18:57

1 Answers1

0

This answer may be seen as incomplete, nevertheless, I am going to post it in order to close the question.

My problem was not the code, the kernel (or that algorithm), it was the machine. The above code is correct and works perfectly. After I tried my program in another machine it executed and computed the solution with no problem at all.

So, in brief, the problem was the OpenCL device or most likely the host libraries.

Blackburn
  • 48
  • 4