I'm looking for a very bare bones matrix multiplication example for CUBLAS that can multiply M times N and place the results in P for the following code, using high-performance GPU operations:
float M[500][500], N[500][500], P[500][500];
for(int i = 0; i < Width; i++){
for(int j = 0; j < Width; j++)
{
M[i][j] = 500;
N[i][j] = 500;
P[i][j] = 0;
}
}
So far, most code I'm finding to do any kind of matrix multiplication using CUBLAS is (seemingly?) overly complicated.
I am attempting to design a basic lab where students can compare the performance of matrix multiplication on the GPU vs matrix multiplication on the CPU, presumably with increased performance on the GPU.