1

I am currently writing a custom PyTorch dataloader that loads the training data for machine learning from a 2GB JSON file.

The dataloader, which is basically a CPython extension module written in C++, loads the entire JSON file into a temporary data structure and converts the data into another in-memory format which is compatible with the PyTorch model I'm using.

I've managed to make the program load and convert the data at a reasonable speed thanks to the brilliant free libraries out there, but it turned out that the program consumes too much memory when I tried to scale up the training.

When PyTorch performs multi-GPU/multi-node training, the library allocates one Python process for each GPU, which means there are several separate instances of my dataloader running at the same time.

My machine has enough RAM for one dataloader to run without problems, but not enough RAM to run several of them.

I could confirm that once the RAM space is exhausted, the dataloaders would start using up several GBs of swap space and it degrades the performance severely.

So I started to see where I could save some RAM space in my dataloader.

I found out that the temporary data structure to which the JSON data is initially loaded was totally unnecessary after the conversion is finished, so I want to free up this memory space for the other processes.

The question is, how am I supposed to do this with the standard library? The data structure basically consists of std::vectors and std::unordered_maps on the heap, but just destructing them does not free up the memory space because there is no heap compaction mechanism implemented in Linux glibc.

On Windows I could implement a custom std::allocator that resides in a separate heap and just destroy the entire heap after use (though I'm not sure this actually works), but glibc malloc() does not take a heap handle parameter.

I don't believe this question is asked for the first time, and I don't either think that implementing a custom std::allocator based on a third-party heap allocator is the only answer. How could I free up some heap space for another process, in Linux glibc? Could you give me pointers?

Thanks in advance.

François Andrieux
  • 28,148
  • 6
  • 56
  • 87
  • [What's the difference between “STL” and “C++ Standard Library”?](https://stackoverflow.com/questions/5205491/whats-the-difference-between-stl-and-c-standard-library) – François Andrieux Feb 05 '20 at 14:36
  • [I couldn't resist](https://xkcd.com/138/). – François Andrieux Feb 05 '20 at 14:39
  • 1
    Possible duplicate : [Force free() to return malloc memory back to OS](https://stackoverflow.com/questions/27945855/force-free-to-return-malloc-memory-back-to-os). Looks like there is a `malloc_trim` in glibc you can try. – François Andrieux Feb 05 '20 at 14:41
  • 1
    As a possible alternative... is it possible to provide the dataloader through some method as a shared object to all of the python instances? Or at a minimum, have your dataloader class be smart enough to use static data structures to prevent reloading of the data for every instance? Beware of multi-threading concerns of course if you were to do so. – Darinth Feb 05 '20 at 15:47
  • I think you take a wrong approach to this problem. Ask yourself the question: Can I prevent the use of a dataloader instance per process? Do I need the temporary data structure or can I do the conversion while loading the file? If I need the temporary, can I maybe preprocess the data in a seperate process and create a format on disk that is readable without the temporary (maybe mmap?)? To help you with these questions, we need more detailed information about your code. @ShenpaiYajulight – n314159 Feb 05 '20 at 15:56

0 Answers0