Mongod proccess uses a lot of memory after bulk upsert with pymongo

Question

I want to upsert records from a large csv.gzip file. I am using generator of chunks, described in this response:

def gen_chunks(reader, chunksize=100):
    """ 
    Chunk generator. Take a CSV `reader` and yield
    `chunksize` sized slices. 
    """
    chunk = []
    for i, line in enumerate(reader):
        if (i % chunksize == 0 and i > 0):
            yield chunk
            del chunk[:]
        chunk.append(line)
    yield chunk

I run a daemon Mongod by the following command:

$ mongod --dbpath data\db

And then start python script with using pymongo:

with gzip.open(filepath, 'rt', newline='') as gzip_file:
    dr = csv.DictReader(gzip_file)  # comma is default delimiter
    chunksize = 10 ** 3

    for chunk in gen_chunks(dr, chunksize):
        bulk = locations.initialize_ordered_bulk_op()
        for row in chunk:
            cell = {
                'mcc': int(row['mcc']),
                'mnc': int(row['net']),
                'lac': int(row['area']),
                'cell': int(row['cell'])
            }
            location = {
                'lat': float(row['lat']),
                'lon': float(row['lon'])
            }
            bulk.find(cell).upsert().update({'$set': {'OpenCellID': location}})
        result = bulk.execute()

Then the RAM memory used by a process increases (sorry for my native language on a screenshot, RAM is the third column): After the complete execution of the script (upserting about 30 million documents), memory used by mongod reached about 15 GB!

What am I doing wrong/misundersting?

P.S. After restarting the daemon RAM memory is reduced to the normal (about 30 Mb).

@steve-rossiter pymongo version is '3.3.1', mongodb, mongodb is 'v3.2.10' — Andrei, Dec 07 '16 at 13:17
do you need to use `initialize_ordered_bulk_op`? can you get away with `initialize_unordered_bulk_op` and does that help? — Steve Rossiter, Dec 07 '16 at 13:21
@SteveRossiter I will check it. But according to the logic of my program, I need just ordered upseting — Andrei, Dec 07 '16 at 13:25
very odd. have you specified an index anywhere? Also have you tired passing `bypass_document_validation=True`? — Steve Rossiter, Dec 07 '16 at 13:37
@SteveRossiter yes, I have compound index: `locations.create_index([('mcc', ASCENDING), ('mnc', ASCENDING), ('lac', ASCENDING), ('cell', ASCENDING)])` — Andrei, Dec 07 '16 at 13:39
@SteveRossiter I don't use `bypass_document_validation`, what is it? — Andrei, Dec 07 '16 at 13:39
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/130019/discussion-between-steve-rossiter-and-andrei). — Steve Rossiter, Dec 07 '16 at 13:41

Mongod proccess uses a lot of memory after bulk upsert with pymongo

0 Answers0