1

Following related question solution I created docker container which loads GoogleNews-vectors-negative300 KeyedVector inside docker container and load it all to memory

KeyedVectors.load(model_path, mmap='r')
word_vectors.most_similar('stuff')

Also I have another Docker container which provides REST API which loads this model with

KeyedVectors.load(model_path, mmap='r')

And I observe that fully loaded container takes more than 5GB of memory and each gunicorn worker takes 1.7 GB of memory.

CONTAINER ID        NAME                        CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
acbfd080ab50        vectorizer_model_loader_1   0.00%               5.141GiB / 15.55GiB   33.07%              24.9kB / 0B         32.9MB / 0B         15
1a9ad3dfdb8d        vectorizer_vectorizer_1     0.94%               1.771GiB / 15.55GiB   11.39%              26.6kB / 0B         277MB / 0B          17

However, I expect that all this processes share same memory for KeyedVector, so it only takes 5.4 GB shared between all containers.

Have someone tried to achieve that and succeed?

edit: I tried following code snippet and it indeed share same memory across different containers.

import mmap
from threading import Semaphore

with open("data/GoogleNews-vectors-negative300.bin", "rb") as f:
    # memory-map the file, size 0 means whole file
    fileno = f.fileno()
    mm = mmap.mmap(fileno, 0, access=mmap.ACCESS_READ)
    # read whole content
    mm.read()
    Semaphore(0).acquire()
    # close the map
    mm.close()

So the problem that KeyedVectors.load(model_path, mmap='r') don't share memory

edit2: Studying gensim's source code I see that np.load(subname(fname, attrib), mmap_mode=mmap) is called to open memmaped file. Following code sample shares memory across multiple container.

from threading import Semaphore

import numpy as np

data = np.load('data/native_format.bin.vectors.npy', mmap_mode='r')
print(data.shape)
# load whole file to memory
print(data.mean())
Semaphore(0).acquire()

2 Answers2

3

After extensive debugging I figured out that mmap works as expected for numpy arrays in KeyedVectors object.

However, KeyedVectors have other attributes like self.vocab, self.index2word and self.index2entity which are not shared and consumes ~1.7 GB of memory for each object.

  • As noted in the related-answer you linked, & my sibling answer below, the generated `vectors_norm` will only be efficiently shared if you take extra steps to prevent it from being regenerated (which aren't yet shown in your code). The `.vectors` and `.vectors_norm` arrays are the bulk of memory use for most sets-of-word-vectors - but the dicts/lists (like `.vocab` or `.index2entity` in `gensim-3.8`) can't be mmap-shared, and thus will consume memory repeatedly. 1.7GB sounds a bit high for their sizes from `GoogleNews` data, but it's possible - and this should improve a bit in `gensim-4.0.0`. – gojomo Sep 10 '20 at 17:25
0

I'm not sure if containerization allows containers to share the same memory-mapped files – but even if it does, it's possible that whatever utility you're using to measure per-container memory usage counts the memory twice even if it's shared. What tool are you using to monitor memory usage and are you sure it'd indicate true sharing? (What happens if, outside of gensim, you try using Python's mmap.mmap() to open the same giant file in two containers? Do you see the same, more, or less memory usage than in the gensim case?)

But also: in order to do a most_similar(), the KeyedVectors will create a second array of word-vectors, normalized to unit-length, in property vectors_norm. (This is done once, when first needed.) This normed array isn't saved, because it can always be re-calculated. So for your usage, each container will create its own, non-shared, vectors_norm array - undoing any possible memory savings from shared memory-mapped files.

You can work around this by:

  • after loading a model but before triggering the automatic normalization, explicitly force it yourself with a special argument to clobber the original raw vectors in-place. Then save this pre-normed version:

    word_vectors = KeyedVectors.load(model_path)
    word_vectors.init_sims(replace=True)
    word_vectors.save(normed_model_path)
    
  • when later re-loading the model in a memory-mapped fashion, manually set vectors_norm property to be the same as vectors, to prevent the redundant re-creation of the normed array:

    word_vectors = KeyedVectors.load(normed_model_path, mmap='r')
    word_vectors.vectors_norm = word_vectors.vectors
    

If it's the norming that's preventing you from seeing the memory savings you expect, this approach may help.

gojomo
  • 52,260
  • 14
  • 86
  • 115
  • 1
    It is clear that models don't share memory as my workstation hangs once I create 4 instances of containers and comes back to life after oom-kill command – Vitali Vinahradski Aug 06 '18 at 13:03