Performance tips mapping indexers to objects

Question

Given the following code I am able to access the pixels of an image and map them to a Color struct.

To my eye though this looks inefficient since every time I access the indexer I am reading from the array four times. It takes about 9 seconds on my computer to run.

What I am looking for is an example of how someone would optimise the code. I'm guessing that I can get some sort of reference to the array using unsafe code and read in one move but I have yet to find a decent guide to explain how you would do that and how you would apply the approach to other situations.

While a code example on its own would aid me, an explanation of exactly what is going on and why this I would improve performance would be more beneficial to myself and anyone reading.

public class Program
{
    public static void Main(string[] args)
    {
        Image img = new Image(10000,10000);

        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();      

        for (int y = 0; y < img.Height; y++)
        {
            for (int x = 0; x < img.Width; x++)
            {
                Color color = img[x,y];
            }
        }

        stopwatch.Stop();

        Console.WriteLine(
        string.Format("Time: {0} ms", stopwatch.ElapsedMilliseconds));

        Console.ReadKey();
    }
}

// Define other methods and classes here
public class Image {

    private float[] pixels;

    public Image (int width, int height){

        pixels = new float[width * height * 4];
        this.Width = width;
        this.Height = height;       
    }

   public int Width { get; private set; }

   public int Height { get; private set; }


   public Color this[int x, int y]
   {
       get
       {

           int start = ((y * this.Width) + x) * 4;
           return new Color(this.pixels[start], 
                            this.pixels[start + 1], 
                            this.pixels[start + 2], 
                            this.pixels[start + 3]);
       }

       set
       {
           int start = ((y * this.Width) + x) * 4;

           this.pixels[start] = value.R;
           this.pixels[start + 1] = value.G;
           this.pixels[start + 2] = value.B;
           this.pixels[start + 3] = value.A;
       }
   }

}

public struct Color {

    public Color (float r, float g, float b, float a):this(){
        this.R = r;
        this.G = g;
        this.B = b;
        this.A = a;
    }

    public float R {get;set;}

    public float G {get;set;}

    public float B {get;set;}

    public float A {get;set;}
}

UPDATE

So with some investigation I've been able to perform some improvements using unsafe code.

Here's one approach to my indexer:

public unsafe class Image {

    private byte* pixels;

    public Image (int width, int height){

        fixed(byte* p = &(new byte[width * height * 4])[0]){
            this.pixels = p;
        }

        this.Width = width;
        this.Height = height;
    }

   public int Width { get; private set; }

   public int Height { get; private set; }

   public Color this[int x, int y]
   {
       get { return *((Color*)(this.pixels + ((y * this.Width) + x) * 4)); }

       set { 
               Color* c = (Color*)(this.pixels + (((y * this.Width) + x) * 4));
                    c->R = value.R;
                    c->G = value.G;
                    c->B = value.B;
                    c->A = value.A;
            }
   }
}

This is around 50% faster but I am concerned about what happens to the pixelsproperty. Does that ever get cleaned up by the garbage collector? Should Image be implementing IDisposableso that I can set the value to null?

Here's a second approach:

public unsafe class Image {

    private byte[] pixels;

    public Image (int width, int height){

        pixels = new byte[width * height * 4];
        this.Width = width;
        this.Height = height;       
    }

   public int Width { get; private set; }

   public int Height { get; private set; }       

   public Color this[int x, int y]
   {
       get
       {
            fixed(byte* p = &this.pixels[0]){
                return *((Color*)(p + ((y * this.Width) + x) * 4));
            }
       }

       set
       {
            fixed(byte* p = &this.pixels[0]){
                Color* c = (Color*)(p + (((y * this.Width) + x) * 4));
                c->R = value.R;
                c->G = value.G;
                c->B = value.B;
                c->A = value.A;
            }
       }
   }
}

This is about 28% faster and as far as I can determine it means I do not have to do anything to further manage memory.

Are there any obvious flaws in either approach?

This is one that would probably fit better on [Code Review](http://codereview.stackexchange.com/). — Stephen Ross, Mar 17 '16 at 09:09
I thought that but I had a look around and saw similar question in this site plus it's also a question about the mechanisms of unsafe code. I'm sure an admin will migrate it if they think it's better there and i'll have no issue if they do. — James South, Mar 17 '16 at 09:15
Use LockBits if you need speed. More information can be found here [http://stackoverflow.com/questions/190385/how-to-manipulate-images-at-pixel-level-in-c](http://stackoverflow.com/questions/190385/how-to-manipulate-images-at-pixel-level-in-c) — Dbuggy, Mar 17 '16 at 09:22
I'm afraid not. I'm not using System.Drawing, this is for something cross platform with corefx https://github.com/JimBobSquarePants/ImageProcessor — James South, Mar 17 '16 at 09:25
The `fixed` statement pins the pointer only during the execution of the `fixed` block, so I'm thinking there's a good chance that the GC will collect the array at some point and then you'll be in trouble. You should probably use [GCHandle.Alloc](https://msdn.microsoft.com/en-us/library/1246yz8f(v=vs.100).aspx) to allocate the memory. Oh, and then you definitely need to implement the Disposable pattern correctly. — Rytmis, Mar 18 '16 at 07:37
Oh, and once you _do_ allocate the GCHandle as `GCHandleType.Pinned`, you'll want to `GCHandle.Free` it in the `Dispose` block. — Rytmis, Mar 18 '16 at 07:40

score 1 · Answer 1 · answered Mar 17 '16 at 09:34

1

You're working exclusively with 32-bit pixels. Why use a byte[] at all, when you can use uint[] instead, and use Color.FromArgb?

Buffer.BlockCopy can be used very efficiently to copy between byte[] and uint[] (and any other value type AFAICT), so if you're reading from a file, that's one option. Using unsafe code is another. In both cases, pay a lot of attention to bounds checking - you're losing a lot of memory safety guarantees.

If you're going for bare-bones speed, you're going to need a lot of unsafe code. That's very tricky, so make sure you actually need that speed - profiling your application as a whole is usually a good idea.

answered Mar 17 '16 at 09:34

Luaan

62,244
7
97
116

@james-south really has to do profiling because at least on release mode more than 80% of time goes to `Color..ctor` and this is hotspot, not array access. – Valery Petrov Mar 17 '16 at 09:45
1

I'm rather assuming that Color.FromArgb isn't available either, as this is .NET Core, where System.Drawing is conspicuously absent (for obvious reasons). :) – Rytmis Mar 17 '16 at 09:48
As @Rytmis says there is no `Color.FromArgb` This is .NET core. Also, I'm not blanket copying. – James South Mar 17 '16 at 09:51
@JamesSouth Sure, but that doesn't really matter. Optimize for whatever you're trying to do with the data - if you're doing image manipulation, working with the whole pixel at once is usually the better and faster approach. Why are you individually working with R, G, B and A, and how often do you need to do so? Retrieving a single byte from an integer is quite cheap, and operations on integers are faster than doing four separate operations on four separate bytes that aren't even aligned (and if they were aligned, every `Color` would require 16 bytes instead of 4 bytes). – Luaan Mar 17 '16 at 10:01
I shouldn't have put bytes in the example, In reality each component is a float so I can support HDR plus solve a whole bunch of rounding issues. I'm gonna update the question. – James South Mar 17 '16 at 10:04
1

@JamesSouth To make this more clear, you can't optimize for all the cases. You need to pick your battles, and different in-memory structures and operations will give you different performance under different conditions. For your sample code, my suggestion will help you quite a bit. As will returning an `uint` directly rather than using your `Color` struct. It's all about the trade-offs. What do you *need* to optimize for? What are you doing with those `Image`s? – Luaan Mar 17 '16 at 10:05
@JamesSouth Okay, in that case, why are you using `float[]` for storage instead of `Color[]`? Other than that, there's not a lot you can do with .NET Core - you're not going to have access to vectorised registers etc. IIRC. – Luaan Mar 17 '16 at 10:09
I'm doing a lot with those images, resizing, rotating, cropping various filters etc. I did think about `Color[]` at one point but chose `float[]` to allow greater interoperability with other libraries. I already have people experimenting integrating my work with game engines and suchlike. – James South Mar 17 '16 at 10:13
@JamesSouth I don't think there's much you can do *without* delving into unsafe territory. Once you're unsafe, `Color*` and `float*` can just be different views on the same block of data. Does it make sense? Probably not. You need to make the hard design decisions, and pick something to focus on. In the end, no matter what you do, some solutions will be slower than others depending on the way you arrange your data, or the way you pass arguments, or the way you premultiply colours... there's no universal approach. You could try giving multiple independent solutions, but that also hurts. – Luaan Mar 17 '16 at 11:45
@Luaan Yeah, makes sense. I've no idea how to code that though, no experience with unmanaged code. – James South Mar 17 '16 at 12:32

Performance tips mapping indexers to objects

1 Answers1