1

I have a multithreaded image processing Python SW where:

  • One thread does image acquisition and appends metadata from other sensors (accelerometer ++) and pushes it all to a queue
  • Another thread pops from the queue and does a computation on the images + metadata

My issue is that the pypylon command cam.RetrieveResult(5000, pylon.TimeoutHandling_ThrowException) takes ~100ms pr. camera and that is when images are already in the buffer (external trigger signal has happened).

Here is a pseudo code of how I do it today:

Acquicition thread:

queu_element = type('queue_element', (object,), {})()
images = [None] * 3
for i, cam in enumerate(cameras):
  grab_result = cam.RetrieveResult(5000, pylon.TimeoutHandling_ThrowException)
  images[i] = grab_result.GetArray().copy()
  grab_result.Release()
queu_element.images = images
queu_element.metadate = get_metadata()
queue.append(queu_element)

Processing thread:

queue_element = queue.pop()
do_awsome_processing(queue_element.images, queue_element.metadata)

My Idea of speeding up my image acquisition is to move the RetriveResult command from my acquisition thread to my processing thread. I am thinking that my Acquisition queue should contain pointers to the memory address and not the retrieved result itself. The processing thread should retrieve the result. Here is some pseudocode of my idea:

Acquiction thread:

queu_element = type('queue_element', (object,), {})()
pointers = [None]*len(cameras)
while cameras[0].no_new_image_in_buffer():
  sleep(0.01)
for i, cam in enumerate(cameras):
  pointers[i] = cam.PointerToNewestBufferedImage()
queue_element.pointers = pointers
queue_element.metadata = get_metadata()
queue. Push(queue_element)

Processing thread:

queue_element = queue.pop()
images = [None] * len(queue_element.pointers)
for i, pointer in enumerate(queue_element.pointers)
  images[i] = RetrieveResultFromMemoryAddess(pointer, pylon.TimeoutHandling_ThrowException)
do_awsome_processing(images, queue_element.metadata)

I hope someone knows a way to guide me towards this solution

I have functional code based on pypylon example code. However, I want to optimize where acquisition and adding new data to queue takes as little time as possible.

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Joachim Spange
  • 85
  • 1
  • 1
  • 10
  • 1
    don't even start with "pointers". they don't exist in python and that's a good thing. in python, everything is a reference. -- don't hesitate to contact Basler for support with their stuff. – Christoph Rackwitz Nov 23 '22 at 19:21
  • 1
    The problem with threads in Python is that your work is CPU bound your "processing" thread will block your "waiting" threads, you eventually you have to switсh to `multiprocessing` module. As @ChristophRackwitz said there is no pointers in Python, so your way to distribute data between threads is the only correct one. – Artiom Kozyrev Nov 24 '22 at 07:47
  • 1
    @ArtiomKozyrev One thread is here waiting for an answer from the camera. That does not affect the other thread at all. Don't go around telling people not to use threads until you really understand the limits imposed by GIL. – zvone Dec 23 '22 at 08:01
  • 1
    @zvone if you think that I am wrong, could you explain how "waiting in another thread" will work in Python, if another thread in the same process does CPU bound tasks (Another thread pops from the queue and does a computation on the images + metadata) ? – Artiom Kozyrev Dec 23 '22 at 08:24
  • 1
    @zvone I would like to share the following code snippet to illustrate why it is bad idea to combine I/O operations and heavy CPU bound operations in one process in Python. https://gist.github.com/ArtyomKozyrev8/c67b7009f086a921ef674282237385d0 – Artiom Kozyrev Dec 23 '22 at 08:57
  • 1
    @ArtiomKozyrev You code snippet has a single Python instruction which takes a long time to complete and keeps the GIL locked. That is a very atypical edge case which starves other threads. Almost any other example would perform better. – zvone Dec 23 '22 at 10:06
  • 1
    @zvone what is "atypical edge case" for you ? Image processing is not "light" CPU operation. Could you explain existence and popularity of 'Celery' if it is not common problem for Python? I work as backend developer and saw the "atypical edge case" in aiohttp/asyncio server application, server just stops to receive http "requests" and looks like "frozen". – Artiom Kozyrev Dec 23 '22 at 12:21
  • 1
    @ArtiomKozyrev It is a *single Python instruction*. That is the problem. Replace it e.g. with an infinite loop `while True: x=x+1` and the other thread will work. The problem is that threads cannot change within a single `10**10000000` instruction. Being CPU-intensive is not a problem. – zvone Dec 23 '22 at 16:54
  • 1
    @zvone there is a very good video by professor David Beazley (PyCon 2015) https://www.youtube.com/watch?v=MCs5OvhV9S4 Watch from 10 to 15 minute of the video, David Beazley shows that Fibonacci function (CPU intensive function) inside server's request handler dramatically decreases rps server can handle, in the video rps was about 30 000 then rps was about 90. If he provided request handler with more heavy task, he would eventually "froze" his server and rps would be around 0. He uses `concurrent.futures.ProcessPoolExecutor` to overcome the issue. – Artiom Kozyrev Dec 23 '22 at 17:36
  • 1
    @zvone That is why people use Celery and different MQ like RabbitMQ and NATS in production and `multiprocessing`/ `concurrent.futures.ProcessPoolExecutor` in other cases. – Artiom Kozyrev Dec 23 '22 at 17:37
  • 1
    @ArtiomKozyrev That was a really insightful video, thanks a lot! – Joachim Spange Dec 29 '22 at 08:57
  • 1
    This is a great discussion. I always thought I could solve my issue in a logical kind of way where I set up a logic for knowing when a new image appears in memory from the camera, assign it an id/timestamp and fetch metadata then. Then from my processing thread, I read metadata and from logic deduce which image I'm to pull from memory. But it's awfully hard to code watertight. Errors will arise and I will be out of sync. #murphy's law – Joachim Spange Jan 04 '23 at 12:03
  • @JoachimSpange I used to work in a company, which make product for face/silhouette recognition, etc. I did not make the product itself, but I were in charge of creating "prototype" apps. Usually my "prototypes" apps in Python received "stream" of http requests from one of product's microservices, the request contained in json some data like event_id, silhouette coordinates and link to video_frame, where silhouette was initially detected. Then I did the requested steps with the data in my app/microservices. – Artiom Kozyrev Jan 09 '23 at 12:49
  • @JoachimSpange you can read this docs about that product https://docs.ntechlab.com/projects/ffserver/en/4.0.3/architecture.html . The architecture is not trivial, but probably you will find smth useful for you. Actually I eventually split my prototype app in 2 microservices: 1st type service received http requests with json, then send tasks to process in RabbitMQ, 2nd type service - processed the data from RabbitMQ. This design helped my prototype to scale well horizontally. – Artiom Kozyrev Jan 09 '23 at 12:54

1 Answers1

2

The answer to this python problem is something else than pointers. After learning about pointers in python Real Python, this became clear to me.

When it comes to streaming images from cameras the pypylon command has the same response time as the Pylon Viewer Software. If you can steam a certain fps there, then you should be able to see the same data response time running cam.RetrieveResult()

However, two settings might need to be edited

  1. USB Steam Maximum Transfer Size
  2. cam.GetStreamGrabberNodeMap().GetNode("MaxTransferSize").Value

Thanks to @ChristophRackwitz and @ArtiomKozyrev for initiating me onto this great learning path.

P.S. This issue was also discussed at the pypylon community on GitHub

Joachim Spange
  • 85
  • 1
  • 1
  • 10