2

I'm trying to create a method which will filter all pixels below given grayscale threshold out (as in, all below will be black, all above will be white). The method works, but is not as fast as I feel it could be.

I decided to use the Parallel class but no matter what I set the MaxDegreeOfParallelism I don't get any speed benefits. I perform some other operations on the bitmap too, and the total time of the operations, no matter what MaxDegreeOfParallelism is is always around 170 ms. When debugging, the time needed to perform this filtering itself takes around 160 ms, so I think there would be a noticeable overall difference.

I'm using an i7 processor, 4 physical cores, 8 logical cores.

The code:

Color black = System.Drawing.Color.FromArgb(0, 0, 0);
Color white = System.Drawing.Color.FromArgb(255, 255, 255);

int lowerBound = (int)((float)lowerBoundPercent * 255.0 / 100.0);
int upperBound = (int)((float)upperBoundPercent * 255.0 / 100.0);

int[][] border = new int[8][];
for (int i=0;i<8;i++)
{
    border[i] = new int[] { i*height/8, (i+1)*height/8-1};
}

Parallel.For(0, 8, new ParallelOptions { MaxDegreeOfParallelism = 8 }, i =>
    {
        for (int k = 0; k < width; k++)
        {
            for (int j = border[i][0]; j <= border[i][1]; j++)
            {
                Color pixelColor;
                int grayscaleValue;
                pixelColor = color[k][j];
                grayscaleValue = (pixelColor.R + pixelColor.G + pixelColor.B) / 3;
                if (grayscaleValue >= lowerBound && grayscaleValue <= upperBound)
                    color[k][j] = white;
                else
                    color[k][j] = black;
            }
        }
    });

color[][] is a jagged array of System.Drawing.Color.

The question: is this normal? If not, what can I do to change it?

EDIT:

Pixel extraction:

Color[][] color;
color = new Color[bitmap.Width][];
for (int i = 0; i < bitmap.Width; i++)
{
    color[i] = new Color[bitmap.Height];
    for (int j = 0; j < bitmap.Height; j++)
    {
        color[i][j] = bitmap.GetOriginalPixel(i, j);
    }
}

Bitmap is an instance of my own class Bitmap:

public class Bitmap
{
    System.Drawing.Bitmap processed;
    //...
    public Color GetOriginalPixel(int x, int y) { return processed.GetPixel(x, y); }
    //...
}
user622505
  • 743
  • 6
  • 23
  • Are you trying to test performance on debug build? – MarcinJuraszek Oct 05 '13 at 20:21
  • Actually, that's what I was doing. I will test it on release build and report back. – user622505 Oct 05 '13 at 20:22
  • How are you measuring? – spender Oct 05 '13 at 20:31
  • 2
    If you multiplied `lowerBound` and `upperBound` by 3, you wouldn't need to divide by 3 when calculating `grayscaleValue`. – Andrew Morton Oct 05 '13 at 20:34
  • Testing it on release build is not enough, it needs to be release without the debugger attached, there are MANY optimizations that have to be disabled when the debugger is detached as it would prevent you from debugging the program effectively. – Scott Chamberlain Oct 05 '13 at 20:40
  • @spender I'm measuring it by getting `DateTime.Now` before and printing `(DateTime.Now-beforeDate).ToString()` to a `TextBox` in my form. – user622505 Oct 05 '13 at 20:42
  • @AndrewMorton Didn't notice that, and it will actually yield a measurable speed up. Thanks! @all I think I found the culprit. In order to access the data without having to use locks I extracted the array from `System.Drawing.Bitmap` pixel by pixel. Turns out it takes a lot of time, so it pretty much cancels out the benefits gained from parallelizing. I don't really have an idea how to bypass it. Thanks for input, though! – user622505 Oct 05 '13 at 20:44
  • @Lasooch Show how you are extracting the pixels out of bitmap (and putting them back in), are you using [LockBits](http://msdn.microsoft.com/en-us/library/system.drawing.bitmap.lockbits.aspx)? That is the way you are supposed to do per pixel operations like this on a Bitmap. Also, on your benchmarking, DateTime has a accuracy of 15ms for each end so your times could be +/- 30ms. For timing like this you really should be using [Stopwatch](http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx) – Scott Chamberlain Oct 05 '13 at 20:46
  • @ScottChamberlain didn't know that about `DateTime`. Tested with `StopWatch` now and it seems like the results are the same, though. Edited to answer your other question. – user622505 Oct 05 '13 at 20:52
  • 2
    I would look in to re-writing your code using LockBits, you might get a big improvement, if so write up the new method as an answer and accept it. – Scott Chamberlain Oct 05 '13 at 20:53

2 Answers2

3

To answer your main question about why your parallel method is not any faster, Parralel.For only starts out with one thread then adds more theads as it detects that more threads may be benifitial in speeding up the work to do, note that the parallel option is MaxDegreeOfParallelism not just DegreeOfParallelism. Quite simply there is just not enough iterations of the loop for it to spin up enough threads to be effective, you need to give each iteration less work to do.

Try giving the parallel operation more work to do by looping of the width instead of by 8 chunks of the height.

Color black = System.Drawing.Color.FromArgb(0, 0, 0);
Color white = System.Drawing.Color.FromArgb(255, 255, 255);

int lowerBound = (int)((float)lowerBoundPercent * 255.0 / 100.0) * 3;
int upperBound = (int)((float)upperBoundPercent * 255.0 / 100.0) * 3;

Parallel.For(0, width, k =>
    {
        for (int j = 0; j < height; j++)
        {
                Color pixelColor;
                int grayscaleValue;
                pixelColor = color[k][j];
                grayscaleValue = (pixelColor.R + pixelColor.G + pixelColor.B);
                if (grayscaleValue >= lowerBound && grayscaleValue <= upperBound)
                    color[k][j] = white;
                else
                    color[k][j] = black;
        }
    });

I would not do both width and height in parallel, you then will likely run in to the opposite problem of not giving each iteration enough work to do.

I highly recommend you go download and read Patterns for Parallel Programming, it goes in to this exact example when discussing how much work you should give a Parallel.For. Look at the "Very Small Loop Bodies" and "Too fine-grained, Too corse-grained" Anti-Patterns starting at the bottom of page 26 of the C# version to see the exact problems you are running in to.

Also I would look in to using LockBits for reading the pixel data in and out instead of GetPixel and SetPixel like we discussed in the comments.

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • @Lasooch I am curious what kind of time difference you get just making the one change of doing less work per `For` loop. – Scott Chamberlain Oct 05 '13 at 22:35
  • I will check that tomorrow, it's quite late around here so I'm calling it a day. The code is still unparallelized, but just changing to the `LockBits` way made the time per frame drop from ~165 ms to ~55 ms. That's amazing! – user622505 Oct 05 '13 at 22:45
  • @Lasooch Feel free to post the LockBits version as an answer too, I wouldn't mind seeing it. – Scott Chamberlain Oct 05 '13 at 22:52
3

Using LockBits I managed to cut the time from ~165 ms to ~55 ms per frame. Then I proceeded to do some more research and combined LockBits with pointer operations in an unsafe context and the Parallel.For loop. The resulting code:

Bitmap class:

public class Bitmap
{
    System.Drawing.Bitmap processed;
    public System.Drawing.Bitmap Processed { get { return processed; } set { processed = value; } }
    // ...
}    

The method:

int lowerBound = 3*(int)((float)lowerBoundPercent * 255.0 / 100.0);
int upperBound = 3*(int)((float)upperBoundPercent * 255.0 / 100.0);

System.Drawing.Bitmap bp = bitmap.Processed;

int width = bitmap.Width;
int height = bitmap.Height;

Rectangle rect = new Rectangle(0, 0, width, height);
System.Drawing.Imaging.BitmapData bpData = bp.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadWrite, bp.PixelFormat);

unsafe
{
    byte* s0 = (byte*)bpData.Scan0.ToPointer();
    int stride = bpData.Stride;

    Parallel.For(0, height, y1 =>
    {
        int posY = y1 * stride;
        byte* cpp = s0 + posY;

        for (int x =0; x<width; x++)
        {
            int total = cpp[0] + cpp[1] + cpp[2];
            if (total >= lowerBound && total <= upperBound)
            {
                cpp[0] = 255;
                cpp[1] = 255;
                cpp[2] = 255;
                cpp[3] = 255;
            }
            else
            {
                cpp[0] = 0;
                cpp[1] = 0;
                cpp[2] = 0;
                cpp[3] = 255;
            }

            cpp += 4;
        }
    });
}

bp.UnlockBits(bpData);

With this kind of work division in the Parallel.For loop the code executes in 1-5 ms, which means approximately a 70x speed up!

I tried making the chunks for the loop 4x and 8x bigger and the time range is still 1-5ms, so I won't go into that. The loop is fast enough anyways.

Thank you very much for your answer, Scott, and thanks everyone for input in the comments.

user622505
  • 743
  • 6
  • 23
  • Thanks for posting your full solution, I am sure this answer will help many people in the future (I know I am going to use it as an example to point at for when other people ask similar stuff) – Scott Chamberlain Oct 06 '13 at 07:47
  • @Scott Chamberlain In that case I'll link the answer which helped me a lot, so those people can find it from here. A part of my solution is actually just a copy-paste from there: [link](http://stackoverflow.com/a/7541950/2845556) – user622505 Oct 06 '13 at 08:32