1

My code at large takes a Python dictionary, that contains database column names and their content, binarizes the columns (struct.pack, array.array, str.encode) and sends them using socket.sendall.

To improve speed, I wrote the binarization part in a generator function, that yields the binary chunks. A generator is sent to a thread, which generates the chunks and puts them in a queue, which the main thread collects and sends away.

However, I still don't get the speed improvement I expected. I figured I'd try using an auxiliary process instead of an aux. thread. The problem is - I can't pass a generator to a process - "not pickleable".

I would be grateful for any suggestions / feedback / insights as to how to go about this type of mechanics.

EDIT: When profiling the code with snakeviz (cProfile with graphics), I saw that socket.recv takes 3/4 of the time, and time.sleep (Waiting for chunks in the main thread) another 1/4. That 1/4 is what I thought I could mitigate with another therad/process, as I read that both blocking socket operations and time.sleep are supposed to release the GIL.

Jay
  • 2,535
  • 3
  • 32
  • 44
  • Not following the edit: Your original posting says nothing about `socket.recv`. And I don't understand how `time.sleep` comes into it. Using a sleep would not be a good way to wait for data from a queue, and though I'm not an expert on the python `Queue` implementation, I don't believe it does that. (Are you using something other than `queue.Queue`?) – Gil Hamilton Nov 20 '17 at 17:56
  • I try to read from the queue, then sleep(0.1) and try again... I based it on some Queue usage examples. socket.recv is just what cProfile tells me about the relative time spent, it could perhaps be related to the insert/commit delay, as I assume recv is blocking – Jay Nov 20 '17 at 20:49
  • To repeat: using sleep is not a good way to wait for data -- you're just wasting time. Why do you not just call `Queue.get`? Its default behavior is to block if necessary until there is data available. – Gil Hamilton Nov 20 '17 at 22:04
  • And again, `recv` was not mentioned in your original posting. If you're simply serializing data, then sending it, that requires no `socket.recv` calls, so there's no way that `recv` can be taking time -- which implies that you're not accurately describing what you're actually doing. – Gil Hamilton Nov 20 '17 at 22:12

1 Answers1

2

I don't see how you would expect to have any performance improvement by doing this in another process. The function of producing a binary representation (serialization) of your dictionary is completely CPU bound -- and, relatively speaking, should be very fast -- while the function of sending it to another system is I/O bound, and will almost certainly be slower and the ultimate bottleneck.

In fact, I wouldn't be surprised if delivering these binary chunks you've created through a queue from one thread to another takes more time than simply running the serializer directly in the socket-sending thread when you factor in thread context-switching and queue insertion/extraction/synchronization overhead (and the effects of the GIL if you're using CPython). And the queue sending/synchronization overhead is unlikely to change for the better if you move the serialization to a separate process.

If you want to do the sending concurrently with other activity -- you're concerned with tying up your main thread for a long time -- then you should probably just delegate the entire task (serialization plus sending) to another thread or process.

Another thing to understand is that when you first begin to send on a socket, the kernel will copy the initial data into internal buffers and immediately return control to you -- the kernel then breaks the data up into packets and sends on the wire as the protocol permits asynchronously. So, the sends will not (at first) appear to be I/O bound. But if you have many megabytes of data to send, eventually the kernel buffer space allowed to your socket will fill and then your thread will be blocked until enough packets have been sent and acknowledged by the peer to free up some of that space.

In other words, in a single-threaded implementation, if G means generate a chunk of data and S is a socket.sendall call, your total time spent in each phase will look something like this:

G|S|G|S|G|S|G|S|G|S-------|G|S------|G|S-----|G|S------ ...

At first, the sends will seem near-instantaneous, but after a while, will start to take longer to complete. And if you aren't generating enough data to experience this effect, then it's even less likely you have any need to push the serialization to a separate thread.

Gil Hamilton
  • 11,973
  • 28
  • 51
  • Very interesting, thank you! The reason I think the queue might be helpful is it actually fills - getting to around 11 chunks (out of 20 I allowed) in my printouts. I believe I had a slight improvement when timing it, though I can't currently reproduce (this was at work). I didn't know queueus had overhead, is there perhaps a "lighter" way of doing this in 2 threads? Also, please read my edit about the GIL. What am I missing? – Jay Nov 18 '17 at 07:59
  • As I explained, I don't think you can expect any improvement by splitting the work between two threads. – Gil Hamilton Nov 20 '17 at 18:05