3

I am currently working on a project that is using node.js as a controlling system to do some relatively large scale machine learning with images. I am running out of memory pretty quickly while trying to do this even though I am trying to optimize the usage as much as possible and my data should not take up an excessive amount of space. My code relies heavily on promises and anonymous functions to manage the pipeline and I am wondering if that's why I'm seeing crazy high usage on my test case.

Just for context I am using the mnist dataset for my testing which can be found here. The training set consists of 60,000 20x20 images. From these I am extracting overfeat features, a description of this can be found here. What this boils down to is a 4,096 element array for each image, so 60,000 of them. I am caching all the image and feature data in redis.

A quick computation tells me that the full feature set here should be 4096 * 8 * 60000 = 1966080000 bytes or appx 1.82GB of memory assuming that each element of the array is a 64bit javascript number. The images themselves should only take up a very small amount of space and I am not storing them in memory. However when I run my code I am seeing more like 8-10GB of memory used after extraction/loading. When trying to do more operations on this data (like print it all out to a JSON file so I can make sure the extraction worked right) I quickly consume the 16GB of available memory on my computer, crashing the node script.

So my general question is: why am I seeing such high memory usage? Is it because of my heavy use of promises/closures? Can I refactor my code to use less memory and allow more variables to be garbage collected?

The code is available here for review. Be aware that it is a little rough as far as organization goes.

Max Ehrlich
  • 2,479
  • 1
  • 32
  • 44
  • Why do you `throw` all errors, instead of rejecting the respective promise? That might lead to ever-pending promises, which *do* eat up memory. – Bergi Nov 05 '14 at 00:44
  • Also, I can spot a [deferred antipatttern](https://stackoverflow.com/questions/23803743/what-is-the-deferred-antipattern-and-how-do-i-avoid-it), which might lead to similar errors. – Bergi Nov 05 '14 at 00:50
  • @Bergi I thought that throwing the errors would reject the promise for me. Also (to my knowledge) nothing is erroring out. – Max Ehrlich Nov 05 '14 at 01:23
  • @Bergi Can you point out where you are seeing the deferred antipattern? I went through a lot of effort to avoid exactly that. – Max Ehrlich Nov 05 '14 at 01:25
  • No, you need to do `reject(err);` instead of `throw err`. See also [this question](http://stackoverflow.com/q/22519784/1048572) – Bergi Nov 05 '14 at 01:29
  • Your `extractFeatures` function with its various subfunctions seems to use the antipattern. – Bergi Nov 05 '14 at 01:31
  • @Bergi Well this is confusing, the answer you linked does say that but the reference I was using (http://www.html5rocks.com/en/tutorials/es6/promises/#toc-error-handling) says otherwise. (scroll down to the section 'JavaScript exceptions and promises'). I have half a mind to throw an exception intentionally just to see what happens. – Max Ehrlich Nov 05 '14 at 02:03
  • @Bergi Yeah I think you may be right about `extractFeatures`, that function is a bit of a mess at the moment. Would this explain the massive discrepancy between what I am computing my memory usage should be and what it actually is though? Is there anything inherent to promises or the way I am using closures that might explain this? – Max Ehrlich Nov 05 '14 at 02:04
  • You might want to also see if your assumptions about the memory usage of your large array is accurate. It wouldn't surprise me if there was some "type" information stored along with the actual number since JS isn't hard typed, the interpreter somewhere has to be able to see what type of value is in every array slot. Also, keep in mind that converting a very large data structure to JSON could temporarily use 2x-4x the amount of memory compared to just the JS array. – jfriend00 Nov 05 '14 at 02:39
  • @jfriend00 That's a great thought about the type information, I hadn't thought about that. As far as the JSON structure, I'm specifically trying to avoid JSON for that reason though it would be nice the verify that my features look right. – Max Ehrlich Nov 05 '14 at 03:03
  • 1
    With an appropriate library, it is possible to "stream" JSON to a file which is a memory efficient way to write out a large JSON structure vs. converting the entire JSON in memory. I write node apps in the little tiny Raspberry Pi so I have to think about this kind of stuff. – jfriend00 Nov 05 '14 at 03:35
  • @jfriend00 thats a great suggestion do you have a link to the library? Also what about swapping out some of my plain JavaScript arrays for typed arrays? – Max Ehrlich Nov 05 '14 at 04:28
  • Just what I've seen in a [Google search](https://www.google.com/search?q=npm+stream+json&rlz=1C1TSNP_enUS471US471&oq=npm+stream+json&aqs=chrome..69i57j0.2324j0j7&sourceid=chrome&es_sm=0&ie=UTF-8). – jfriend00 Nov 05 '14 at 04:33
  • 1
    You're using a terribly slow promise implementation. Consider bluebird for performance sensitive promise work. – Benjamin Gruenbaum Nov 05 '14 at 06:11
  • If (when) you switch to Bluebird, also note you should use `.promisify` or `.promisifyAll` for converting callback APIs - it generates functions in the background and then the JIT can easily compile them - it's much faster than doing `new Promise(...)` – Benjamin Gruenbaum Nov 05 '14 at 09:25
  • 1
    @BenjaminGruenbaum I am definitely switching to bluebird, I will update with how that goes. – Max Ehrlich Nov 05 '14 at 19:43
  • @BenjaminGruenbaum Bluebird has allowed me to drastically reduce my memory usage, if you want to post it as an answer I will accept it. I think I can still do better with my closures though – Max Ehrlich Nov 09 '14 at 15:09
  • @MaxEhrlich I've added an answer. – Benjamin Gruenbaum Nov 10 '14 at 07:11

1 Answers1

3

Your code uses the "promise" library which to be fair is very memory hoggy and was not really built for raw performance. If you switch to Bluebird promises you can get considerably more items in RAM as it will drastically reduce your memory usage.

Here are benchmark results for doxbee-sequential:

results for 10000 parallel executions, 1 ms per I/O op

file                                 time(ms)  memory(MB)
promises-bluebird.js                      280       26.64
promises-then-promise.js                 1775      134.73

And under bench parallel (--p 25):

file                                time(ms)  memory(MB)
promises-bluebird.js                     483       63.32
promises-then-promise.js                2553      338.36

You can see the full benchmark here.

Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504