-1

The question is quite straightforward. I'll also explain what I do in case there is a faster way to do this without optimizing this specific way.

I go through an image and its rgb values. I have bins of size 256 for each color. So for every pixel I calculate the 3 bins of its rgb values. The bins essentially give me the index to access data for the specific color in a large vector. With this data, I do some calculations which are irrelevant. What I want to optimize is the accessing part.

Keep in mind that the large vector has an extra dimension. Every pixel belongs to some defined areas of the image. For every area it belongs to, it has an element in the big vector. So, if a pixel belongs in 4 areas(eg 3,9,12,13) then the data I want to access is: data[colorIndex][3],data[colorIndex][9],data[colorIndex][12],data[colorIndex][13].

I think that's enough to explain the code which is the following:

    //Just filling with data for the sake of the example
    int cols = 200; int rows = 200;
    cv::Mat image(200, 200, CV_8UC3);
    image.setTo(Scalar(100, 100, 100));
    int numberOfAreas = 50;
    //For every pixel (first dimension) we have a vector<int> containing ones for every area the pixel belongs to.
    //For this example, every pixel belongs to every area.
    vector<vector<int>> areasThePixelBelongs(200 * 200, vector<int>(numberOfAreas, 1));

    int numberOfBins = 32;
    int sizeOfBin = 256 / numberOfBins;

    vector<vector<float>> data(pow(numberOfBins, 3), vector<float>(numberOfAreas, 1));
    //Filling complete
    
    //Part I need to optimize
    uchar* matPointer;
    for (int y = 0; y < rows; y++) {
        matPointer = image.ptr<uchar>(y);
        for (int x = 0; x < cols; x++) {
            int red = matPointer[x * 3 + 2];
            int green = matPointer[x * 3 + 1];
            int blue = matPointer[x * 3];
            int binNumberRed = red / sizeOfBin;
            int binNumberGreen = green / sizeOfBin;
            int binNumberBlue = blue / sizeOfBin;

            //Instead of a 3d vector where I access the elements like: color[binNumberRed][binNumberGreen][binNumberBlue]
            //I use a 1d vector where I just have to calculate the 1d index as follows
            int index = binNumberRed * numberOfBins * numberOfBins + binNumberGreen * numberOfBins + binNumberBlue;
            vector<int>& areasOfPixel = areasThePixelBelongs[y*cols+x];
            int numberOfPixelAreas = areasOfPixel.size();
            for (int i = 0; i < numberOfPixelAreas; i++) {
                float valueOfInterest = data[index][areasOfPixel[i]];
                //Some calculations here...
            }
        }

    }

Would it be better accessing each mat element as a Vec3b? I think I'm essentially accessing an element 3 times for each pixel using uchar. Would accessing one Vec3b be faster?

John Katsantas
  • 571
  • 6
  • 20
  • 2
    [Tangent] FWIW, if `pow(numberOfBins, 3)` is using `std::pow`, then you dont want to use that anymore. `pow` only works internally with floating point types so `pow(numberOfBins, 3)` could be off by one when truncated into an integer. If you are only doing integer powers, write your own or get an integer pow function from somewhere. – NathanOliver Jul 21 '21 at 17:49
  • @NathanOliver I see. I'll just do `numberOfBins*numberOfBins*numberOfBins` for now. – John Katsantas Jul 21 '21 at 18:03
  • 1
    not sure whether it increases speed, but instead of matPointer = image.ptr(y); you should use matPointer = image.ptr(y); to iterate through the 3-channel image and to access R G and B channel of a pixel – Micka Jul 21 '21 at 18:38
  • 1
    instead of / sizeOfBin you could use shift operation if sizeOfBin is power of 2 – Micka Jul 21 '21 at 18:40
  • if dimensions are known before (e.g..as defines) I think using and accessing multi dimensional array int areasThePixelBelongs[200*200][50] is faster than vector> and same for the float – Micka Jul 21 '21 at 18:51
  • even if you need dynamic array size there is a faster way: https://stackoverflow.com/questions/17259877/1d-or-2d-array-whats-faster/17260025#17260025 – Micka Jul 21 '21 at 18:58
  • You'll want to operate in memory sizes that can fit into the processor's data cache. This will minimize the quantity of data cache reloads. – Thomas Matthews Jul 21 '21 at 19:06
  • @Micka Thanks for coming back and adding each idea you came up with. I'll give them all a try. – John Katsantas Jul 22 '21 at 00:20
  • @ThomasMatthews I've never heard of or know what data cache is. The more I code the more I hear about new stuff. I'll look into it. Any good source? – John Katsantas Jul 22 '21 at 00:23

1 Answers1

3

First of all vector<vector<T>> is not efficiently stored in memory as it is not contiguous. This as often a big impact on performance and should be avoided as mush as possible (especially when the inner arrays are of the same size). Instead of this, you can use std::array for fixed-size arrays or a flatten std::vector (with the size dim1 * dim2 * ... dimN).

Moreover, the loop is a good candidate for parallelization. You can parallelize this code easily with OpenMP. This assumes Some calculations here can be implemented in a thread-safe way (you should be careful about shared writes if any). If this code is embarrassingly-parallel, then the resulting parallel code can be much faster. Still, using multi-threading introduces some overhead which may be too big compared to the overall computation time (which is highly dependent of the content in Some calculations here).

Finally, regarding the content in Some calculations here it may or may not be possible to adapt the code so the compiler use SIMD instructions. The data[index][areasOfPixel[i]] will likely prevent most compiler to do that, but the following computation could be. Note that software prefetching and gather instructions may help to speed up a bit the data[index][areasOfPixel[i]] operation.

Note that the way you access pixels should not have a significant impact on the runtime as the computation should be bounded by the speed of the inner loop iterating on areas containing some unknown code (unless this unknown code actually access pixels too).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59