LockBits appears to be too slow for my needs - alternatives?

Question

I'm working on 10 megapixel images taken by a video camera.

The aim is to register in a matrix (a two-dimensional array) the grayscale values for each pixel.

I first used GetPixel but it took 25 seconds to do it. Now I use Lockbits but it sill takes 10 seconds, and 3 if I don't save the results in a text file.

My tutor said they don't need to register the results but 3 seconds is still too slow. So am I doing something wrong in my program or is there something faster than Lockbits for my application?

Here is my code:

public void ExtractMatrix()
{
    Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");

    int[,] GRAY = new int[3840, 2748]; //Matrix with "grayscales" in INTeger values

    unsafe
    {
        //create an empty bitmap the same size as original
        Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);

        //lock the original bitmap in memory
        BitmapData originalData = bmpPicture.LockBits(
           new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
           ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

        //lock the new bitmap in memory
        BitmapData newData = bmp.LockBits(
           new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
           ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

        //set the number of bytes per pixel
        // here is set to 3 because I use an Image with 24bpp
        int pixelSize = 3;

        for (int y = 0; y < bmpPicture.Height; y++)
        {
            //get the data from the original image
            byte* oRow = (byte*)originalData.Scan0 + (y * originalData.Stride);

            //get the data from the new image
            byte* nRow = (byte*)newData.Scan0 + (y * newData.Stride);

            for (int x = 0; x < bmpPicture.Width; x++)
            {
                //create the grayscale version
                byte grayScale =
                   (byte)((oRow[x * pixelSize] * .114) + //B
                   (oRow[x * pixelSize + 1] * .587) +  //G
                   (oRow[x * pixelSize + 2] * .299)); //R

                //set the new image's pixel to the grayscale version
                //   nRow[x * pixelSize] = grayScale; //B
                //   nRow[x * pixelSize + 1] = grayScale; //G
                //   nRow[x * pixelSize + 2] = grayScale; //R

                GRAY[x, y] = (int)grayScale;
            }
        }

You could potentially speed this up by using the [TPL](http://msdn.microsoft.com/en-us/library/dd537608.aspx) to make the for loops run in parallel. — Bob Vale, May 07 '13 at 11:14
Is the pixel format you specify when locking the image the same as the images native format? — CodesInChaos, May 07 '13 at 11:19
Find what part is slow. You're doing 10 million iterations. If in the inner loop something can be optimized, you can gain great performance increase. — CodeCaster, May 07 '13 at 11:21
I know I have 10millions iterations but i don't know how i could do it faster every single line in the loop seems important for me. — Elo Monval, May 07 '13 at 11:25
If you want it faster, use single precision (aka `float`) instead of `double`. Just add `f` to those floating point constants. — leppie, May 07 '13 at 11:40
@leppie Since .net doesn't use SSE I doubt float would be much faster than double. — CodesInChaos, May 07 '13 at 12:05
I would strongly suggest that you use `Stopwatch` to time parts of your code to see where it's taking all the time. Do you know that the loop itself is taking 3 seconds? — Jim Mischel, May 07 '13 at 12:58

score 6 · Answer 1 · edited May 23 '17 at 11:43

Here are some more optimizations that may help:

Use jagged arrays ([][]); in .NET, accessing them is faster than multidimensional;
Cache properties that will be used inside of a loop. Though this answer states that JIT will optimize it, we don't know what's happening internally;
Multiplication is (generally) slower than addition;

As others have stated, float is faster than double, which applies to older processors (~10+ years). The only upside here is that you're using them as constants, and thus consume less memory (especially because of the many iterations);

Bitmap bmpPicture = new Bitmap(nameNumber + ".bmp");

// jagged instead of multidimensional 
int[][] GRAY = new int[3840][]; //Matrix with "grayscales" in INTeger values
for (int i = 0, icnt = GRAY.Length; i < icnt; i++)
    GRAY[i] = new int[2748];

unsafe
{
    //create an empty bitmap the same size as original
    Bitmap bmp = new Bitmap(bmpPicture.Width, bmpPicture.Height);

    //lock the original bitmap in memory
    BitmapData originalData = bmpPicture.LockBits(
       new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
       ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

    //lock the new bitmap in memory
    BitmapData newData = bmp.LockBits(
       new Rectangle(0, 0, bmpPicture.Width, bmpPicture.Height),
       ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

    //set the number of bytes per pixel
    // here is set to 3 because I use an Image with 24bpp
    const int pixelSize = 3; // const because it doesn't change
    // store Scan0 value for reuse...we don't know if BitmapData caches it internally, or recalculated it every time, or whatnot
    int originalScan0 = originalData.Scan0;
    int newScan0 = newData.Scan0;
    // incrementing variables
    int originalStride = originalData.Stride;
    int newStride = newData.Stride;
    // store certain properties, because accessing a variable is normally faster than a property (and we don't really know if the property recalculated anything internally)
    int bmpwidth = bmpPicture.Width;
    int bmpheight = bmpPicture.Height;

    for (int y = 0; y < bmpheight; y++)
    {
        //get the data from the original image
        byte* oRow = (byte*)originalScan0 + originalStride++; // by doing Variable++, you're saying "give me the value, then increment one" (Tip: DON'T add parenthesis around it!)

        //get the data from the new image
        byte* nRow = (byte*)newScan0 + newStride++;

        int pixelPosition = 0;
        for (int x = 0; x < bmpwidth; x++)
        {
            //create the grayscale version
            byte grayScale =
               (byte)((oRow[pixelPosition] * .114f) + //B
               (oRow[pixelPosition + 1] * .587f) +  //G
               (oRow[pixelPosition + 2] * .299f)); //R

            //set the new image's pixel to the grayscale version
            //   nRow[pixelPosition] = grayScale; //B
            //   nRow[pixelPosition + 1] = grayScale; //G
            //   nRow[pixelPosition + 2] = grayScale; //R

            GRAY[x][y] = (int)grayScale;

            pixelPosition += pixelSize;
        }
    }

Good suggestions, but I think you've missed the main problem with the code: it's (accidentally) transposing the bitmap, which is a very cache-unfriendly operation if done naively. — Daniel, May 07 '13 at 13:21
@Daniel Yup, I caught on to that, but decided to just focus on optimization using the existing code. Good point, though. =) — Jesse, May 07 '13 at 14:18

score 4 · Answer 2 · edited May 23 '17 at 12:22

4

Your code is converting from a row-major representation into a column-major representation. In the bitmap, pixel (x,y) is followed by (x+1,y) in memory; but in your GRAY array, pixel (x,y) is followed by (x,y+1).

This causes inefficient memory access when writing, as every write touches a different cache line; and you end up trashing the CPU cache if the image is big enough. This is especially bad if your image size is a power of two (see Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?).

Store your array in row-major order as well if possible to avoid the inefficient memory access (replace GRAY[x,y] with GRAY[y,x]).

If you really need it in column-major order, look at more cache-friendly algorithms for matrix transposition (e.g. A Cache Efficient Matrix Transpose Program?)

edited May 23 '17 at 12:22

Community

1
1

answered May 07 '13 at 13:19

Daniel

15,944
2
54
60

i don't understand why you say GRAY registers (x,y) then (x, y+1). first loop y=0, x=0, then y=0 and x=1 etc... – Elo Monval May 08 '13 at 07:53
@EloMonval: I'm talking about the order the elements are stored in memory. Your loop accesses the array in a different order than it is stored in memory, which causes a significant slowdown due to inefficient cache usage. – Daniel May 08 '13 at 10:29

score 1 · Answer 3 · answered May 07 '13 at 12:46

Your code may not be optimal, but a quick skim seems to show even this version should run in a fraction of a second. This suggests there's some other problem:

Are you:

Compiling in Release mode? Debug mode turns off various optimizations
Running with a debugger attached? If you run from visual studio using F5 then (with the default C# keyshortcuts) the debugger will be attached. This can dramatically slow down your program, particularly if you have any breakpoints or intellitrace enabled.
Running on some limited device? It sounds like you're running on a PC, but if you're not, then device-specific limitations might be relevant.
I/O limited? Although you talk about a video camera, your code suggests you're dealing with the filesystem. Any file-system interaction can be a bottleneck, particularly once networked disks, virus scanners, physical platters and fragmentation come into play. A 10 mp image is 30MB (if uncompressed RGB without an alpha channel), and reading/writing that could easily take 3 seconds depending on the details of the filesystem.

score 0 · Answer 4 · edited May 23 '17 at 12:28

0

I'm not sure why the second part of the inner for loop is commented out, but if you don't need that, you're doing some unnecessary casting. Removing it might improve your performance.

Also, as leppie suggested, you could use single precision floats:

        for (int x = 0; x < bmpPicture.Width; x++)
        {
            //create the grayscale version
           GRAY[x, y] =
               (int)((oRow[x * pixelSize] * .114f) + //B
               (oRow[x * pixelSize + 1] * .587f) +  //G
               (oRow[x * pixelSize + 2] * .299f)); //R

        }

edited May 23 '17 at 12:28

Community

1
1

answered May 07 '13 at 11:33

Rik

28,507
14
48
67

So you are saying casting to an `int` will be faster than casting to a `byte`? – leppie May 07 '13 at 11:36
@leppie, no, I'm saying casting to an int might be faster than casting to a byte and _then_ casting to an int. – Rik May 07 '13 at 11:41
And the second part is set to comments because i don't need it. I just use it sometimes to have a visual result to be sure of what I'm doing. – Elo Monval May 07 '13 at 11:42

Francesco De Lisi · Answer 5 · 2013-05-07T13:10:53.883

You can try to avoid multiplies and increment setting up a pointer with the x * pixelSize starting value and changing your code to this:

for (int x = 0; x < bmpPicture.Width; x++)
            {    
               int *p = x * pixelSize;

                GRAY[x, y]=
                   (int)((oRow[*p] * .114) + //B
                   (oRow[*p++] * .587) +  //G
                   (oRow[*p++] * .299)); //R
             }

This will speed up your code, but I'm not sure it will be significantly faster.

Note: this will speed up code only if iterating through an array of value type and will not work if oRow changes to some other type.

score 0 · Answer 6 · edited May 07 '13 at 14:20

Here's an alternative transformation that uses only integer arithmetic, it's slightly different (due to rounding of the factors) but not anything you'd notice with the naked eye: (not tested)

byte grayScale = (byte)((
      (oRow[pixelPosition] * 29) +
      (oRow[pixelPosition + 1] * 151) +
      (oRow[pixelPosition + 2] * 105)) >> 8);

The scale factors are approximately the old ones multiplied by 256, the shift in the end divides by 256.

WhileTrueSleep · Answer 7 · 2013-05-08T20:16:34.120

0

Huge optimation will be achieved by using a 1D array instead of 2D array.

All other will not give you a high speedup...

edited May 08 '13 at 20:16

answered May 08 '13 at 06:26

WhileTrueSleep

1,524
1
19
32

LockBits appears to be too slow for my needs - alternatives?

7 Answers7