I want to multiply a 8x8 binary matrix represented as a unsigned 64 bit integer by a 8 bit vector represented by a unsigned char. However, due to some other issues the matrix must be ordered by columns, ergo there's no easy matching of bytes for easy multiplication.
Any idea how to speed up such a calculation? Every operation counts for I need billions of such calculations made.
The multiplications are made over a 2 element field (F-2).