3

A quick test shows that cPickle (python 3.6.9 import pickle defaults to using cPickle) engages the GIL.

import pickle
import os

big_data = os.urandom(10000000)

def run():
    pickle.loads(pickle.dumps(big_data))

t = timeit.Timer(run)
[threading.Thread(target=lambda: t.timeit(number=2000)).start() for _ in range(4)]

That test of 4 threads running serialization operations runs at 100% cpu, e.g. it engages the GIL. The same type of test running a numpy operation uses 400% cpu (no GIL engaged with numpy).

I was hoping cPickle, being a C function, wouldn't engage the GIL. Is there any way around this? I'd like to be able to deserialize a large amount of data without blocking the main process.

I am trying to pull in upward of 3GB of data per second from worker processes back to main. I can move the data with streaming sockets and asyncio at 4GB/sec, but the deserialization is a bottleneck. I don't have the luxury of Python 3.8 and SharedMemory yet unfortunately.

An acceptable answer is, of course, a confident No.

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • 1
    I don't see why the fact that the the module is a C-extension should make you think that it wouldn't engage the GIL. From my understanding, the fundamental problem the GIL solves is thread-safe access to Python interpreter level objects which rely on reference counting for garbage collection. Since `pickle` serialization/deserialization touches Python objects that other threads might have access to, it has to engage the GIL. – juanpa.arrivillaga Nov 13 '19 at 21:13
  • @juanpa.arrivillaga I'll accept that if you post that as an answer. Your explanation sounds quite likely to be correct. I was equating C functions to external functions that can release the GIL as numpy does, but as you point out that doesn't seem reasonable in the case of Python object serialization. – David Parks Nov 13 '19 at 23:58

1 Answers1

1

Taking @juanpa.arrivillaga's answer from comments to close this question:

I don't see why the fact that the the module is a C-extension should make you think that it wouldn't engage the GIL. From my understanding, the fundamental problem the GIL solves is thread-safe access to Python interpreter level objects which rely on reference counting for garbage collection. Since pickle serialization/deserialization touches Python objects that other threads might have access to, it has to engage the GIL.

David Parks
  • 30,789
  • 47
  • 185
  • 328