0

My application streams from a camera using the RTSPClientSharp library, there is an OnFramesReceived event that gets raised when a decoded frame is ready. I was converting the decoded frame to a Bitmap in the same Event, This is a blocking call and takes more than 100ms and is causing the frame rate to slow down to 10 FPS.

To solve this I have used the Task Queue code from here which queues up the ProcessFrame event(has the code to convert decoded frame to Bitmap) using Task.ContinueWith.UnWrap. My aim is to execute the ProcessFrame calls sequentially in the order I received the frames. Using the Task Queue solved the problem of blocking call and now I'm able to process 30 Frames Per Second.

However, I'm having a memory issue now, if my application is running longer, the memory usage is gradually increasing. ANTS memory profiler says (Check ScreenShot) that the ContinuationResultFrom Task is the largest class in the Gen2.

Update Some of the facts I'd like to include, I have 10 such cameras connected to my application, each camera has its own instance of the camera class. I'm using a 16 core processor with hyperthreading and 32GB of RAM, still, if the CPU can't handle the load I would prefer to decrease the FPS to 10.

  private void OnFramesReceived(object sender, RawFrame rawFrame)
    {
         taskQueue.Enqueue(() => Task.Run(() => ProcessFrame?.Invoke(this, decodedFrame)));           
    }

  private void HandleProcessFrame(object sender, IDecodedVideoFrame decodedFrame)
    {
        try
        { 
            using (Bitmap bmpBitmap = new Bitmap(m_Width, m_Height))
            {
                BitmapData bmpData = bmpBitmap.LockBits(new Rectangle(0, 0, bmpBitmap.Width, bmpBitmap.Height), ImageLockMode.WriteOnly, bmpBitmap.PixelFormat);

                try
                {
                    decodedFrame.TransformTo(
                        bmpData.Scan0,
                        bmpData.Stride,
                        _transformParameters);
                }
                finally
                {
                    bmpBitmap.UnlockBits(bmpData);
                } 
                base.OnNewFrameEvent(this, bmpBitmap);
                decodedFrame = null;
            
            }


        }
        catch (Exception ex)
        {
            Logng.LogError(ex);
        }
    }
 public class TaskQueue
{
    private Task previous = Task.FromResult(false);
    private object key = new object();

    public Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
    {
        lock (key)
        {
            var next = previous.ContinueWith(t => taskGenerator()).Unwrap();
            previous = next;
            return next;
        }
    }
    public Task Enqueue(Func<Task> taskGenerator)
    {
        lock (key)
        {
            var next = previous.ContinueWith(t => taskGenerator(), TaskContinuationOptions.ExecuteSynchronously).Unwrap();
            previous = next;
            return next;
        }
    }
}
  • 2
    Totally expected. You have a bucket with a hole and you are pouring water in it faster than it drains from the hole. Changing the size of the bucket only will change the point in time when it will spill over. But it will. Always. All you can do with this is prevent peeks from crashing your app. But without increasing the throughput of bitmap conversion you'll always run into Memory problems. – Fildor Aug 26 '20 at 08:09
  • 1
    ^^ So as of now, you don't even need to think about whether this "solution" is correct or not. Even a perfectly implemented "task queue" will not solve your problem. – Fildor Aug 26 '20 at 08:13
  • Have you tried to process the queue in parallel? This may increase the speed to empty the queue in order to prevent unprocessed frames from piling up. You would use some sorting or indexing to maintain the original order of the resulting bitmaps. – BionicCode Aug 27 '20 at 08:38

1 Answers1

1

By using continuations you are creating a queue that is not centrally controlled, and also one that is not memory efficient. Your are paying 200-300 bytes overhead for each continuation, on top of the actual payload (the RawFrame). I suggest to switch to something more organized and efficient, like the TPL Dataflow library.

Below is an example of using the TPL Dataflow library. A single ActionBlock, the simplest component of this library, provides the horse power for the computations. You can configure the size of its internal queue by setting the BoundedCapacity option. When the queue becomes full, oncoming messages will be dropped (the Post method will return false). You can also configure the MaxDegreeOfParallelism. You can either utilize all the available cores/processors of the machine, or let a core or two free to do other work.

private readonly ActionBlock<RawFrame> _actionBlock;

public MyClass() // constructor
{
    _actionBlock = new ActionBlock<RawFrame>(rawFrame =>
    {
        ProcessFrame(rawFrame);
    }, new ExecutionDataflowBlockOptions()
    {
        BoundedCapacity = 10, // the default is unbounded
        MaxDegreeOfParallelism = Environment.ProcessorCount,  // the default is 1
    });
}

private void OnFramesReceived(object sender, RawFrame rawFrame)
{
    _actionBlock.Post(rawFrame);
}

The TPL Dataflow library is built-in the .NET Core, and available as a package for .NET Framework.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • This assumes that just dropping work on the floor is acceptable, and we don't actually know that that's okay. – Servy Aug 26 '20 at 16:57
  • @Servy I think that it's a safe bet that the OP would prefer to drop some frames, than to clog the memory with a forever increasing number of unprocessed frames. – Theodor Zoulias Aug 26 '20 at 17:11
  • Yes, they need to do *something*, but it most likely needs to be a more complicated redesign of the application, depending on what the work being done actually is. – Servy Aug 26 '20 at 17:32
  • @Servy maybe, but my solution should be a quick and easy fix to the OP's problem. What I don't like to my solution is that the frame dropped is the most recent frame. Ideally the oldest frames should be dropped first. Unfortunately the TPL Dataflow is not configurable in that aspect. The [Channels](https://devblogs.microsoft.com/dotnet/an-introduction-to-system-threading-channels/) do have this option ([`BoundedChannelFullMode.DropOldest`](https://docs.microsoft.com/en-us/dotnet/api/system.threading.channels.boundedchannelfullmode)), but implementing a solution based on Channels is trickier. – Theodor Zoulias Aug 26 '20 at 17:55
  • It's a quick and easy solution to this problem *that also creates a new, even bigger, problem*. Solving a performance problem by computing an invalid result isn't solving the performance problem in a *meaningful* way. They need to address their underlying problem of not being able to process items sufficiently quickly. – Servy Aug 26 '20 at 18:00
  • @Servy what is the problem that my solution creates? If the CPU can handle the load, then no frames will be dropped. If not, then my solution offers a relief valve. In no way this obstructs the OP from improving the performance of the processing algorithm. – Theodor Zoulias Aug 26 '20 at 18:14
  • The problem it creates is not processing frames that the OP intends to process. You're assuming it's okay to just not process them. That may well not be okay, given what they're trying to do. – Servy Aug 26 '20 at 18:21
  • @Servy yes, this is my assumption. Let's wait to see what the OP has to say about it. – Theodor Zoulias Aug 26 '20 at 18:23
  • @TheodorZoulias Thanks for the response, I'm new to the Data Flow library I have tried implementing your solution, but soon after the streaming has started the post method is always returning false. – Sai Kiran Vedire Aug 26 '20 at 21:48
  • 1
    @TheodorZoulias I think I missed some facts in my question, I have 10 such cameras(@30fps) connected to my application, each camera has its own instance of the camera class. I have added the ActionBlock in my camera constructor and set the MaxDegreeOfParallelism to 2 and BoundedCapacity to 30 assuming this will be unique to each instance of the camera class. I'm using a 16 core processor with hyperthreading, still, the post was returning false. – Sai Kiran Vedire Aug 26 '20 at 21:48
  • @Servy Thanks for the response, you are right I can't miss any frame, I need to run the ProcessFrame event in such a way that it doesn't block the OnFramesEvent or I need to bring down the Processing time of the ProcessImage. I'm not sure how to improve the Bitmap creation logic to decrease the processing time so I'm trying with the threading approach. – Sai Kiran Vedire Aug 26 '20 at 21:57
  • @SaiKiranVedire a possible reason why the `Post` returns always false is that the `ActionBlock` has been faulted, because of an exception during the processing of a frame. If the `ProcessFrame` doesn't have error handling, then you could attach an error handler to the `Completion` property of the `ActionBlock`. This property is a `Task`, so you can attach a `ContinueWith`, and log the `Exception` of the task in case `IsFaulted`. – Theodor Zoulias Aug 27 '20 at 02:08
  • @SaiKiranVedire btw I think that the facts about the number of cameras and the capabilities of the machine are important information for answering the question, so you should probably update your question by adding these facts. You could also mention what is the desirable behavior in case the CPU can't handle the load. It should be a realistic expectation, like dropping frames, or terminating the App intentionally with an exception, or letting it consume all the available memory before eventually crashing with an `OutOfMemoryException` etc. – Theodor Zoulias Aug 27 '20 at 02:29
  • @TheodorZoulias post is returning false because the queue is already full and it is ignoring the frames, I set the BoundedCapacity to 30 per camera. I have updated the question with the facts. Thanks. – Sai Kiran Vedire Aug 27 '20 at 17:49
  • @SaiKiranVedire so you are observing that (1) after the streaming has started the `Post` method is always returning false, (2) the `ActionBlock` is not faulted and (3) the queue of the `ActionBlock` is permanently full. I guess that something prevents the frames from being processed. It is possible that you have a deadlock. Does the `ProcessFrame` method include any interaction with the UI? – Theodor Zoulias Aug 27 '20 at 18:57
  • @TheodorZoulias 1. Not immediately after the streaming is started, because I see the video on the UI so it is processing some frames but not 10 FPS(I have decreased FPS from 30 to 10 on all the cameras) 2. Yes, the Action Block seems like is not faulted 3. Action Block may not be permanently full since I see the Video on the UI 4. In the process frame, I update the display on the UI. – Sai Kiran Vedire Aug 27 '20 at 19:22
  • @SaiKiranVedire I suggest to update your question by including the `ProcessFrame` method. It is possible that the bottleneck is the UI thread, in which case it doesn't matter how many cores has the machine you are using. In that case you should prioritize finding a way to reduce the amount of work that is done on the UI thread. – Theodor Zoulias Aug 27 '20 at 19:29