Can I separate Python set update into several threads?

Question

I'm doing some brute-force computing and putting the results into a set all_data. Computing chunks of data gives a list of numbers new_data, which I want to add to the big set: all_data.update(new_data). Now while the computational part is easily made parallel by means of multiprocessing.Pool.map, the update part is slow.

Obviously, there is a problem if one has two identical elements in new_data, that are absent in all_data, and trying to add them at the same moment. But if we assume new_data to be a set as well, is there still a problem? The only problem I can see is the way sets are organized in memory, so the question is:

Is there a way to organize a set structure that allows simultaneous addition of elements? If yes, is it realized in Python?

`multiprocessing` provides primitives for sharing data and synchronization. Take a look at [this](http://stackoverflow.com/a/17393879/4371276) — dmg, Jan 13 '15 at 10:13

score 1 · Answer 1 · edited May 23 '17 at 10:25

In pure Python, no. Due to the GIL, all Python code (including manipulations with built-in structures) is single-threaded. The very existence of GIL is justified by eliminating the need to lock access to them.
- Even the result fetching in multiprocessing.Pool.map is already sequential.
ParallelProgramming article in SciPy wiki outlines related options on parallel code but I didn't find anything directly regarding off-the-shelf concurrent data structures.
- It does, however, mention a few C extensions under "Sophisticated parallelization" which do support parallel I/O.
Note that a set is actually a hash table which by its very nature cannot be updated in parallel (even with fine-grained locking, sub-operations of each type (look-up, insertion incl. collision resolution) have to be sequenced). So you need to either
- replace it with some other data organization that can be parallelized better, and/or
- speed up the update operations, e.g. by using shared memory

Can I separate Python set update into several threads?

1 Answers1