Linear Transform of a 2D Photo

Question

Preface: I'm not sure whether to categorize this as more of a math question or a programming question, plus I'm a noob at Linear Algebra and doing this for a hobby. Any help is greatly appreciated.

Let's say I've got some arbitrary programming language that doesn't have any established mechanisms to perform linear transformations on say a bitmap image (or any arbitrary collection of x,y values). Let's say I want to perform some arbitrary rotation, scale, and translation.

Right now I would iterate through each x,y, get the pixel color, and perform my transformation on it, round to the nearest new x,y to copy my pixel color value to, and then produce a final image. It works for a simple rotation that I've precalculated, but takes several seconds to calculate inside my TransformImage method below on an i5 so I'm wondering what's a faster way?

This is my current method just tesing in C#:

        Color[,] BackupOriginal = OriginalColorData.Clone() as Color[,];
        float angle = 25.0f;
        float angleRadians = (float)(angle * (Math.PI / 180f));
        float cosAngle = (float)Math.Cos(angleRadians);
        float sinAngle = (float)Math.Sin(angleRadians);
        BackupOriginal = LinearTransformation.TransformImage(BackupOriginal, new float[2, 2] {
            {cosAngle,-1f * sinAngle},
            {sinAngle,cosAngle}
        });

...

    public static Color[,] TransformImage(Color[,] originalImage, float[,] transformationMatrix)
    {
        if (transformationMatrix.GetUpperBound(1) < 1 || transformationMatrix.GetUpperBound(0) < 1) return null;

        int width = originalImage.GetUpperBound(1) + 1;
        int height = originalImage.GetUpperBound(0) + 1;
        Color[,] newImage = new Color[height, width];
        for (int y=0;y<height;y++)
        {
            for (int x=0;x<width;x++)
            {
                Color currentPixel = originalImage[y, x];
                int newX = (int)Math.Round((x * transformationMatrix[0, 0]) + (y * transformationMatrix[0, 1]));
                int newY = (int)Math.Round((x * transformationMatrix[1, 0]) + (y * transformationMatrix[1, 1]));
                if (IsValidPixel(newX, newY, width, height))
                    newImage[newY, newX] = currentPixel;
            }
        }
        return newImage;
    }

I believe this is more of a question for stackoverflow – Mickey Jun 24 '17 at 18:35 — Mickey, Jun 24 '17 at 18:35
Ok I'll try there thanks – John Ernest Jun 24 '17 at 18:35 — John Ernest, Jun 24 '17 at 18:35

score 0 · Accepted Answer · answered Jun 26 '17 at 22:51

Basically, this is an operation where you do not want to implement it yourself in an arbitrary language. There are several concerns that you need to consider:

All bounds-checking can lead to some expense. C# will check the limits on all array indices and so do you. For every pixel. If you are lucky, the JIT compiler oves some of this out, but it's still costly. 2D arrays are more costly than the more common 1D case as well specifically in C#, Why are multi-dimensional arrays in .NET slower than normal arrays?
Conversions back and forth between floating point and integers are costly. You are converting int to float four times and float to int two times. Per pixel.

We could stop here and just consider the number of operations you are now doing for every pixel, beyond the trivial assignment.

However, more importantly:

Memory is not uniform. You want to read and write to it in somewhat contiguous blocks. You are reading in a perfect order now, but depending on the transform, you are writing all over the place. This is hard to optimize for an "arbitrary transform", but the general idea is to try to do small blocks that fit your cache hierarchy. Since the transform is linear, any small tile in one space will very roughly end up in a similar place in the other space.
Modern CPUs are very good at vector operations. With the proper logic, chunks of 32 bytes at a time (8 32-bit pixels) can be copied, and the same goes for computing the new pixel locations.

Then, you have the issues of what to do with aliasing. There is no easy way to guarantee that every pixel in newImage will map to one and only one pixel in originalImage. You might end up writing to the same pixel multiple times (where you might really want to blend the results), and some pixels might end up empty.

So, what can be done in a reasonably short code snippet? Not too much. If you are looking for performance, I would at least try to get away from Math.Round (just add .5 before casting for crude rounding), and ideally avoid floating point altogether. An option is to store an integer multiplier and then shift (the >> operator) the result back to the same range, i.e. store cosAngle as cosAngle * 65536 and shift newX by 16. Per the previous comment, maybe even storing the transformation matrix as four individual float values could be preferable over the 2D array there. I would have hoped the JIT compiler handled that case.

But, in the end, there is no magic to it. Performant implementations tend to be longer and also tend to be written in other languages. A C or C++ implementation of the same logic could speed up things a bit, but you would still have to handle especially the memory bandwidth issue, where a sequential loop over the full source image will give you an unfavorable memory access pattern in the target for arbitrary angles. That part you can test yourself, is the angle of 0 measurably faster than, say, .4?

Much of the speedup from a C implementation would come from the optimizing compiler being actively tweaked for emitting neat code for compute-heavy loops, while the .NET environment in practice focuses more on line-of-business applications.

I don't think the use of the Color type here is a problem, since .NET structs tend to be fast, but if you would go for evaluating performance, I would probably try to use just a simple int or float array of grayscale values to get an idea if the OO abstraction somehow gets in the way of proper optimization.

I'm going ahead and accepting this as the answer, because it's very thorough and steered me in exactly the direction I needed to go. Interestingly enough, my unoptimized calculations up there came out about the same speed as replacing the rounding calls with MatrixTransform.TransformPoint in .NET. — John Ernest, Jun 26 '17 at 23:03
I should have been a little clearer that the top section that precalculates the Matrix used for transformation is only done once. What everything here suggests then, is that it goes back to 2D arrays being slow. — John Ernest, Jun 26 '17 at 23:05
I also had unwittingly ran this in Debug, which I'm sure added to a lot of speed degradation. It's running about 2x the speed now. Although I have a question now about single dimension array access. If you need to calculate an X/Y offset within a single dimension (by saying something like y*width+x) are you still sacrificing some speed by doing multiplications? — John Ernest, Jun 26 '17 at 23:21

Linear Transform of a 2D Photo

1 Answers1