I am currently working on a project that is using node.js as a controlling system to do some relatively large scale machine learning with images. I am running out of memory pretty quickly while trying to do this even though I am trying to optimize the usage as much as possible and my data should not take up an excessive amount of space. My code relies heavily on promises and anonymous functions to manage the pipeline and I am wondering if that's why I'm seeing crazy high usage on my test case.
Just for context I am using the mnist dataset for my testing which can be found here. The training set consists of 60,000 20x20 images. From these I am extracting overfeat features, a description of this can be found here. What this boils down to is a 4,096 element array for each image, so 60,000 of them. I am caching all the image and feature data in redis.
A quick computation tells me that the full feature set here should be 4096 * 8 * 60000 = 1966080000
bytes or appx 1.82GB of memory assuming that each element of the array is a 64bit javascript number. The images themselves should only take up a very small amount of space and I am not storing them in memory. However when I run my code I am seeing more like 8-10GB of memory used after extraction/loading. When trying to do more operations on this data (like print it all out to a JSON file so I can make sure the extraction worked right) I quickly consume the 16GB of available memory on my computer, crashing the node script.
So my general question is: why am I seeing such high memory usage? Is it because of my heavy use of promises/closures? Can I refactor my code to use less memory and allow more variables to be garbage collected?
The code is available here for review. Be aware that it is a little rough as far as organization goes.