3

I have been researching various options in python of threading, multiprocessing async etc as ways of handling two incoming streams and combining them. There is a lot of information about, but the examples are often convoluted and complicated, and more commonly are to split up a single task into multiple threads or processes to speed up the end result of the task.

I have a data stream coming in over a socket (currently using UDP as its another application running locally on my PC, but may consider switching to TCP in the future if the application needs to be run on a separate PC), and a serial stream coming in via an RS232 adaptor, and I need to combine the streams. This new stream is then retransmitted on another socket.

The issue is that they come in at different rates (serial data is coming in at 125hz, socket data at 60-120hz), so I want to add the latest serial data to the socket data.

My question is essentially what is the best way to handle this, based on other peoples previous experience. Since this is essentially an I/O task, it lends more towards threading (which I know is limited to concurrency by the GIL), but due to the high input rate, I am wondering if Multi-processing is the way to go?

If using threading, I guess the best way to access each shared resource is using a Lock to write the serial data to an object, and in a separate thread whenever there is new socket data then acquiring the lock, accessing the latest serial data in the object, processing it then sending it out on the other socket. However, the main thread has a lot of work to in between each new incoming socket message.

With Multi-processing I could use a pipe to request and receive the latest serial data from the other process, but that only offloads the serial data handling, and still leaves a lot for the main process.

yivi
  • 42,438
  • 18
  • 116
  • 138
birdistheword99
  • 179
  • 1
  • 15

3 Answers3

2

Are you sure you need multi-threading here? If not strictly needed I would for sure avoid it.

  • I haven't been programming too much lately against serial ports and sockets, but as far as I know, for both the data is buffered by HW/middleware, so from that perspective there should be no need for a thread per incoming stream.
  • regarding the main thread that has a lot of work to do: are you sure that this cannot be combined in the thread that does the I/O?

If it is somehow feasible, I would write a loop that reads from both streams alternatively, process/combine it and write it to the out socket:

while True:
  serial_data_in = serial_in.read()
  socket_data_in = socket_in.read()
  socket_out.write(combine(serial_data_in, socket_data_in))

Maybe some tweeking on the timeouts of the read()s is needed, to avoid missing data on one if there wouldn't be data incoming in the other one.

If that wouln't work, I would still keep as few threads as possible. E.g. you could use one thread for the reading (like above) and use a Queue to communicate to a thread that does the processing and the writing to the out socket:

q = queue.Queue()

def worker_1:
  while True:
    serial_data_in = serial_in.read()
    socket_data_in = socket_in.read()
    q.put((serial_data_in, socket_data_in))

def worker_2:
  while True:
    (serial_data_in, socket_data_in) = q.get()
    socket_out.write(combine(serial_data_in, socket_data_in))
    q.task_done()

Queuestake away the lower level synchronization complexity of locking objects.

QuadU
  • 321
  • 1
  • 5
  • Unfortunately not, I tried that originally and because of the difference in update rates I was using older serial data to update the latest UDP data. Even if I kept reading from the serial (to clear the buffer) so thee latest data was in there, the main thread just struggled to keep up – birdistheword99 Nov 30 '20 at 10:48
2

I think using select is very straightforward. It tells you which socket has data (or EOF) to read.

Actually, a similar question has been asked before: Python - Server listening from two UDP sockets

Please note that only one read from a socket returned by select is guaranteed not to block. Check again before continuing reading. That means if you are reading a data stream, read into a buffer until you receive a whole line or other data unit that can be processed.

Your question differs from the linked one, because you need to read from network and a serial interface. Linux has no problem with it, any file descriptor can be used with select. However on Windows, only sockets can be used with select. I do not work with Windows, but it looks like you will need a dedicated thread for reading the serial line.

yivi
  • 42,438
  • 18
  • 116
  • 138
VPfB
  • 14,927
  • 6
  • 41
  • 75
  • +1 for the recommendation on using SELECT, and the warnings about using it with windows (i.e only sockets supported). Wasn't aware of the select module. – birdistheword99 Nov 30 '20 at 10:45
1

I can suggest the approach used here - https://stackoverflow.com/a/641488/4895189. If you have a structure for the data you receive through the socket and the serial you can write those structures with timestamps to individual pipe objects.

I would prefer multiprocessing over threading from my experience. I have used pyserial for reading and writing for UART, in which the main thread was used for writing and a separate thread for reading. For reasons I could not find out, I missed frames both in input and output if I wrote data without adding a pretty large delay (~1000ms) between sequential write calls. Generally, I find using pyserial with Python's Threading having odd behavior. Currently, I am not sure if it is due to pyserial's implementation or Python's GIL.

That being said, I think you can use the following structure for your setup based on the answer I linked to above:

Child Process 1 - Read data from Socket and write to Pipe with the timestamp
Child Process 2 - Read data using pyserial and write to Pipe with the timestamp
Main Process - Perform select on both pipe objects at an interval of your choice, combine the streams and transmit to the output socket.

Divyesh Peshavaria
  • 589
  • 1
  • 5
  • 13
  • Like it, allows me to use Select with the two pipe objects on windows (can't use select on pySerial with windows) and then each process only handles its own stream and data processing. EDIT: (Will accept the answer once I've tried it and can confirm it works) – birdistheword99 Nov 30 '20 at 10:57