I'm trying to create application like in title. Generally, almost everything work fine, except multithreading in ASM. When I want to use multithreading during multiply, sometimes some elements from the first column in the first row in the result matrix is 0. I was trying everything and I can't find my mistake, so I want to ask your help. There is code written in c++:
void multiply(int* resultRow, int* row, int* column, int size)
{
int* startCol = column;
int* start = row;
for (int i = 0; i < size; i++)
{
column = startCol;
column += i;
(*resultRow) = 0;
row = start;
for (int j = 0; j < size; j++)
{
(*resultRow) += ((*row) * (*column));
row++;
column += size;
}
resultRow++;
}
}
This function works fine even with multithreading(resultRow is address of i-th row in matrix, row and columns are addresses of exact row and column in matrices I want to multiply) There is an ASM code:
.CODE
;-------------------------------------------------------------------------
;-------------------------------------------------------------------------
AsmMultiplication PROC loopCount: qword, secondLoopCount: qword, startColAddress : qword, startRowAddress : qword, count : qword, matrixSize : qword
; resultRow in RCX
; rowToMultiply in RDX
; colToMultiply in R8
; size int R9
mov matrixSize, R9
mov loopCount, R9
mov secondLoopCount, R9
mov count, 0
mov R10, RDX
mov R9, RCX
mov startColAddress, R8
mov startRowAddress, R10
loop1:
mov R8, startColAddress ; column = startColAddress
mov R10, startRowAddress ; row = startRowAddress
mov RAX, count ; |
mov RCX, 4 ; |
mul RCX ; |
add R8, RAX ; | column += i
xor RAX, RAX ; |
mov [R9], RAX ; (*resultRow) = 0
mov RAX, matrixSize ; |
mov loopCount, RAX ; |
pxor xmm2, xmm2 ; | preparing for multiplying in loop2
inc count
loop2:
movq xmm0, qword ptr [R10] ;move actual row element to vector
movq xmm1, qword ptr [R8] ;move actual column element to vector
pmuludq xmm0, xmm1 ;multiply vectors
paddq xmm2, xmm0 ;add result to third vector
add R10, 4 ; row++
mov RAX, matrixSize ; |
mov RDX, 4 ; |
mul RDX ; |
add R8, RAX ; | column += size
mov RDX, loopCount ; |
dec RDX ; | decrementing loop counter
mov loopCount, RDX ; |
jnz loop2 ; | if loopCount == 0 break
movq RAX, xmm2 ; |
mov [R9], RAX ; | resultRow = rows * columns
add R9, 4 ; | resultRows++
mov RDX, secondLoopCount ; |
dec RDX ; |
mov secondLoopCount, RDX ; |
jnz loop1 ; | if secondLoopCount == 0 break
ret
AsmMultiplication ENDP
end
There how im using multithreading:
public void ThreadedFunction(int size, int rows)
{
unsafe
{
fixed (int* resultRow = &m3.matrix[rows, 0])
fixed (int* rowToMultiply = &m1.matrix[rows, 0])
fixed (int* colToMultiply = &m2.matrix[0, 0])
if(Asm == false)
{
MatrixMultiplication.App.multiply(resultRow, rowToMultiply, colToMultiply, size);
}
else
{
MatrixMultiplication.App.AsmMultiplication(resultRow, rowToMultiply, colToMultiply, size);
}
}
}
...
for (int i = 0; i < threadsCount; i++)
{
threads[i] = this.StartTheThread(size, rows);
rows++;
There is a simple result matrix in .txt file with multithreading: Correct: enter image description here
and not correct: enter image description here
i have no idea why sometimes output is correct and sometimes not, but that mistake is only i first row at result matrix. Could anyone explain whats wrong? I know im using in n threads the same rows and columns in matrixes but why then c++ code is working fine ?