I have a look-up object, specifically a pre-trained word2vec model from gensim.models.keyedvectors.Word2VecKeyedVectors
. I need to do some data pre-processing and I am using multi-processing for the same. Is there a way in which all of my processes can use the object from the same memory location instead of each process loading the object into its own memory?
Asked
Active
Viewed 197 times
0

angryweasel
- 316
- 2
- 10
2 Answers
1
Yes, if:
- the files were saved using Gensim's internal
.save()
method, and the relevant large-arrays of vectors are clearly separate.npy
files - the files are loaded using Gensim's internal
.load()
method, with themmap
option - you avoid doing any operations which inadvertently cause each process's object to reallocate the backing array completely (breaking the mmap-sharing).
See this prior answer for an overview of the steps/concerns of a similar need.
(The concern & extra steps listed there to avoid breaking the mmap-sharing – by performing manual patch-ups of the norm
properties – should no longer be necessary in Gensim 4.0.0, currently available only as a prerelease version.)

gojomo
- 52,260
- 14
- 86
- 115