0

I'm doing some brute-force computing and putting the results into a set all_data. Computing chunks of data gives a list of numbers new_data, which I want to add to the big set: all_data.update(new_data). Now while the computational part is easily made parallel by means of multiprocessing.Pool.map, the update part is slow.

Obviously, there is a problem if one has two identical elements in new_data, that are absent in all_data, and trying to add them at the same moment. But if we assume new_data to be a set as well, is there still a problem? The only problem I can see is the way sets are organized in memory, so the question is:

Is there a way to organize a set structure that allows simultaneous addition of elements? If yes, is it realized in Python?

Andrei Smolensky
  • 257
  • 2
  • 12
  • `multiprocessing` provides primitives for sharing data and synchronization. Take a look at [this](http://stackoverflow.com/a/17393879/4371276) – dmg Jan 13 '15 at 10:13

1 Answers1

1
  • In pure Python, no. Due to the GIL, all Python code (including manipulations with built-in structures) is single-threaded. The very existence of GIL is justified by eliminating the need to lock access to them.

    • Even the result fetching in multiprocessing.Pool.map is already sequential.
  • ParallelProgramming article in SciPy wiki outlines related options on parallel code but I didn't find anything directly regarding off-the-shelf concurrent data structures.

    • It does, however, mention a few C extensions under "Sophisticated parallelization" which do support parallel I/O.
  • Note that a set is actually a hash table which by its very nature cannot be updated in parallel (even with fine-grained locking, sub-operations of each type (look-up, insertion incl. collision resolution) have to be sequenced). So you need to either

    • replace it with some other data organization that can be parallelized better, and/or
    • speed up the update operations, e.g. by using shared memory
Community
  • 1
  • 1
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152