cv::mat CV_8U product error and slow CV_32F product

Question

I am trying to make a product between a 2772x128 matrix and a 4000x128 matrix. Both are matrices of SIFT descriptors, using next code:

Mat a = Mat(nframes, descrSize, CV_8U, DATAdescr);
Mat b = Mat(vocabulary_size, descrSize, CV_8U, vocabulary);
Mat ab =a * b.t();

The problem is that when calculating the product, it throws an error saying

err_msg = 0x00cdd5e0 "..\..\..\src\opencv\modules\core\src\matmul.cpp:711: error: (-215) type == B.type() && (type == CV_32FC1 || type == CV_64FC1 || type == CV_32FC2 || type == CV_64FC2)"

The solution to this has been to convert the data type to CV_32FC1

Mat a = Mat(nframes, descrSize, CV_8U, DATAdescr);
Mat b = Mat(vocabulary_size, descrSize, CV_8U, vocabulary);
a.convertTo(a, CV_32FC1);
b.convertTo(b, CV_32FC1);
Mat ab = a * b.t();

It works well, but it is consuming too much time, about 1.2 s. I would like to try the same product but using integers, to see if I can speed this up. Am I doing something wrong? I can't see any reason I cannot do matrix product between CV_8U matrices.

EDIT: The answers are related to using other libraries or solving other way. I was thinking on opening a new thread with advice to solve my problem, but can anybody answer my original quiestion pleas? Can I not multiply CV_8U or CV32S matrices? Really?

score 2 · Answer 1 · edited May 23 '17 at 10:24

In your other message you said that the following code would take 0.9 seconds.

MatrixXd A = MatrixXd::Random(1000, 1000);
MatrixXd B = MatrixXd::Random(1000, 500);
MatrixXd X;

I tried a little benchmark on my machine, intel core i7 running on linux. My full benchmark code is the following:

#include <Eigen/Dense>
using namespace Eigen;

int
main(int argc, char *argv[])
{
  MatrixXd A = MatrixXd::Random(2772, 128);
  MatrixXd B = MatrixXd::Random(4000, 128);
  MatrixXd X = A*B.transpose();
}

I just use the time command from linux so the running time includes the launching and stopping of the executable.

1/ Compiling with no optimisation (gcc compiler):

g++ -I/usr/include/eigen3 matcal.cpp -O0 -o matcal
time ./matcal
real    0m13.177s  -> this is the time you should be looking at
user    0m13.133s
sys     0m0.022s

13 seconds, that's very slow. By the way, without matrix multiplication it takes 0.048s, with bigger matrices that in your 0.9s example. Why ??

Using compilers optimisation with Eigen is very important. 2/ Compiling with some optimisation:

g++ -I/usr/include/eigen3 matcal.cpp -O2 -o matcal
time ./matcal
real    0m0.324s
user        0m0.298s
sys     0m0.024s

Now 0.324s, that's better!

3/ Switching all the optimization flags (at least all that I know of, I'm not an expert in this field)

g++ -I/usr/include/eigen3 matcal.cpp -O3 -march=corei7 -mtune=corei7 -o matcal 
time ./matcal
real    0m0.317s
user    0m0.291s
sys     0m0.024s

0.317, close, but a few ms gained (consistantly for a few tests). So in my opinion you do have a problem with your usage of Eigen, either you dont switch compiler optimization or your compiler does not do it by itself.

I'm not an expert in Eigen I have only used it a few time but I think the documentation is quite good and you probably should read it to get the most of it.

Concerning performance comparison with MatLab, last time I read about Eigen it was not multithreaded while MatLab probably use multithreaded libraries. For matrix multiplication you could split up your matrix in several chunks and parallelize multiplication of each chunk using TBB

Thanks remi. My CPU is i5, maybe this is the reason is a little bit slower than your execution. The point is that Matlab is 0.05 seconds, 1 order faster. I don't know if its for multithreading, but I saw somewhere also that Matlab is using Intel MKL for matrix product. I tried to use it on a demo version of Intel MKL, and its about 0.20s. Much better than Eigen or OpenCV, but still far slower than Matlab. — min.yong.yoon, Oct 01 '12 at 10:54
There is a [benchmark](http://eigen.tuxfamily.org/index.php?title=Benchmark) on Eigen page that compares the result for operations such as A'*A with MKL, and it seems to compare well. Maybe you could ask in their forum why there is such a difference, at least between Eigen and MKL in your benchmark. MatLab is tuned to get the best possible optimisations for such operations, that is sure. But it seems reasonable to expect close performances from C++. — remi, Oct 03 '12 at 09:41

min.yong.yoon · Answer 2 · 2012-09-21T09:50:35.017

Suggested by remi, I implemented the same matrix multiplication using Eige. Here it is:

const int descrSize = 128;
MatrixXi a(nframes, descrSize);
MatrixXi b(vocabulary_size, descrSize);
MatrixXi ab(nframes, vocabulary_size);

unsigned char* dataPtr = DATAdescr;
for (int i=0; i<nframes; ++i)
{
    for (int j=0; j<descrSize; ++j)
    {
        a(i,j)=(int)*dataPtr++;
    }
}
unsigned char* vocPtr = vocabulary;
for (int i=0; i<vocabulary_size; ++i)
{
    for (int j=0; j<descrSize; ++j)
    {
        b(i,j)=(int)*vocPtr ++;
    }
}


ab = a*b.transpose();
a.cwiseProduct(a);
b.cwiseProduct(b);
MatrixXi aa = a.rowwise().sum();
MatrixXi bb = b.rowwise().sum();

MatrixXi d = (aa.replicate(1,vocabulary_size) + bb.transpose().replicate(nframes,1) - 2*ab).cwiseAbs2();

The key line is the line that says

ab = a*b.transpose();

vocabulary an DATAdescr are arrays of unsigned char. DATAdescr is 2782x128 and vocabulary is 4000x128. I saw at implementation that I can use Map, but I failed at first to use it. The initial loops for assigment are 0.001 cost, so this is not a bottleneck. The whole process is about 1.23 s

The same implementation in matlab (0.05s.) is:

aa=sum(a.*a,2); bb=sum(b.*b,2); ab=a*b'; 
d = sqrt(abs(repmat(aa,[1 size(bb,1)]) + repmat(bb',[size(aa,1) 1]) - 2*ab));

Thanks remi in advance for you help.

Martin Beckett · Answer 3 · 2012-09-18T15:15:25.777

0

If you multiply a matrix you multiply element values and sum them - if you only have a range of 0-255 it's quite likely that the product is going to be more than 255. So a productof a CV_8U matrix isn't very useful.

If you know that your result will fit in a byte you can ust do the multiplication yourself with looping over the elements.

edit: I'm a little surprised that the float version is so much slower, generally opencv is pretty good performance wise - with multi-core and optomised SSE2 instructions. Did you build from source? Do you have TBB (ie mutlithreading) and an SSE2 cpu?

edited Sep 18 '12 at 15:15

answered Sep 18 '12 at 15:07

Martin Beckett

94,801
28
188
263

I know that could be a problem. I also tried to use a.convertTo(a, CV_32S);, but the result is the same. I cannot use matrix product using opencv, supposed to be faster than my own implementation. Actually, the reason to use opencv to calculate this is just to try to speed up my own implementation of array products – min.yong.yoon Sep 18 '12 at 15:08
@min.yong.yoon generally with a matrix mult you are only doing small matrices of points rather than a huge image and you almost always want a floating point answer so that's what opencv is coded for. You can write your own overload to the "*" operator, or just write a matrix_multiply() function – Martin Beckett Sep 18 '12 at 15:13
So, two conclusions. 1. We cannot make product between CV_8U matrices, and 2. If these matrices are large, it is recommended to calculate the results iterating manually. Any clues on iterating these matrices the faster way? – min.yong.yoon Sep 18 '12 at 15:21
If they are CV_8U you have to do it manually. hint look at .ptr() to get a pointer to each row and step along that rather than using .at() or CV_ELEM macros – Martin Beckett Sep 18 '12 at 15:27
Thanks Martin, I'll try and report results – min.yong.yoon Sep 18 '12 at 15:31

score 0 · Answer 4 · answered Sep 18 '12 at 20:56

0

Try compiling OpenCV using EIGEN as back end. There is an option for this in the CMakeList. I read in your command that you use OpenCV just to speed up the matrix multiplication, soi you might even wanna try EIGEN directly.

One last solution, use the GPU module of OpenCV.

answered Sep 18 '12 at 20:56

remi

3,914
1
19
37

Thanks remi. Actually, I used Eigen library befro trying with OpenCV. The computer gets frozen for about 5-10 minutes calculation my matrices product. It is not a problem from the computer, I have an Intel i5, 8GB RAM. That is the reason I tried with OpenCV. – min.yong.yoon Sep 19 '12 at 07:07
Have you turned on the optimisation flags in your compiler when using Eigen? There is a high improvement of the performances when you do. Apart from some other bug, I doubt OpenCV matrix product would be faster than Eigen's. – remi Sep 19 '12 at 12:22
Yes, OpenCV matrix product was faster. Actually, calculating this concrete product with my own loop using pointers is about 0.5 seconds, and using opencv 0.8. But using Eigen the computer got frozen. Maybe some mistake from me? I could share the code, but seeing it was getting frozen I deleted it... Maybe I could try one more time. The most annoying thing is that using Matlab it is only 0.05 s! – min.yong.yoon Sep 19 '12 at 16:34
post your code and your compilation command then. Eigen should be around the same speed as MatLab, and definitely faster than loops, at least using a straightforward implementation – remi Sep 21 '12 at 08:10
I added a new answer to my own question posting the code remi. – min.yong.yoon Sep 21 '12 at 13:39

cv::mat CV_8U product error and slow CV_32F product

4 Answers4

Linked