6

I am using Microsoft OnnxRuntime to detect and classify objects in images and I want to apply it to real-time video. To do that, I have to convert each frame into an OnnxRuntime Tensor. Right now I have implemented a method that takes around 300ms:

public Tensor<float> ConvertImageToFloatTensor(Bitmap image)
    {
        // Create the Tensor with the appropiate dimensions  for the NN
        Tensor<float> data = new DenseTensor<float>(new[] { 1, image.Width, image.Height, 3 });

        // Iterate over the bitmap width and height and copy each pixel
        for (int x = 0; x < image.Width; x++)
        {
            for (int y = 0; y < image.Height; y++)
            {
                Color color = image.GetPixel(x, y);

                data[0, y, x, 0] = color.R / (float)255.0;
                data[0, y, x, 1] = color.G / (float)255.0;
                data[0, y, x, 2] = color.B / (float)255.0;
            }
        }

        return data;
    }

I need this code to run as fast as possible since I am representing the output bounding boxes of the detector as a layer on top of the video. Does anyone know a faster way of doing this conversión?

Ignacio
  • 806
  • 1
  • 10
  • 29

1 Answers1

9

based in the answers by davidtbernal (Fast work with Bitmaps in C#) and FelipeDurar (Grayscale image from binary data) you should be able to access pixels faster using LockBits and a bit of "unsafe" code

public Tensor<float> ConvertImageToFloatTensorUnsafe(Bitmap image)
{
    // Create the Tensor with the appropiate dimensions  for the NN
    Tensor<float> data = new DenseTensor<float>(new[] { 1, image.Width, image.Height, 3 });    
    
    BitmapData bmd = image.LockBits(new Rectangle(0, 0, image.Width, image.Height), System.Drawing.Imaging.ImageLockMode.ReadOnly, image.PixelFormat);
    int PixelSize = 3;

    unsafe
    {
        for (int y = 0; y < bmd.Height; y++)
        {
            // row is a pointer to a full row of data with each of its colors
            byte* row = (byte*)bmd.Scan0 + (y * bmd.Stride);
            for (int x = 0; x < bmd.Width; x++)
            {           
                // note the order of colors is BGR
                data[0, y, x, 0] = row[x*PixelSize + 2] / (float)255.0;
                data[0, y, x, 1] = row[x*PixelSize + 1] / (float)255.0;
                data[0, y, x, 2] = row[x*PixelSize + 0] / (float)255.0;
            }
        }

        image.UnlockBits(bmd);
    }
    return data;
}

I've compared this piece of code averaging over 1000 runs and got about 3x performance improvement against your original code but results may vary.

Also note I've used 3 channels per pixel as your original answer uses those values only, if you use a 32bpp bitmap, you may change PixelSize to 4 and the last channel should be alpha channel (row[x*PixelSize + 3])

fraztto
  • 166
  • 3
  • I have tried this code and it works, but the average tensor loading time for a 300x300 image is coming out to be around 65ms, which is still too high. ML.NET internal pipeline somehow manages to do this in much lesser amount of time. – Nouman Qaiser Jan 03 '22 at 09:01
  • 1
    May not be super helpful, but the samples use some pre-baked image transformations that might be doing some of the speedy parts: https://github.com/dotnet/machinelearning-samples/blob/main/samples/csharp/end-to-end-apps/ObjectDetection-Onnx/OnnxObjectDetection/ML/OnnxModelConfigurator.cs#L25 the source to those may also be helpful: https://github.com/dotnet/machinelearning/blob/main/src/Microsoft.ML.ImageAnalytics/ImagePixelExtractor.cs – Nate Lowry Jun 23 '22 at 19:36
  • in my experience, I need a `new DenseTensor(new[] { 1, image.Width, image.Height, 4 })` (with 4 in the last position of the array). But I get a `Non-zero status code returned while running Node: Status Message: Input channels C is not equal to kernel channels * group. C: 224 kernel channels: 3 group: 1` error. In the loop, I set `data[0, y, x, 2]` to `row[x*PixelSize + 3] / (float)255.0`. What could have gone wrong? – Bamdad Aug 24 '23 at 14:43