3

I'm trying to write a crawler in Nodejs. The crawler is constantly scanning links, collecting information from those links, making calculations, rankings, scrapping, etc.

The only problem is that the memory usage continually grows, and never gets collected.

I've tried setTimeouts, process.nextTick(), setting variables to null, declaring global variables, reusing them to avoid garbage, etc.

The only effective way has always been restarting the app.

Is there a way to force garbage collection in production?

Wandering Fool
  • 2,170
  • 3
  • 18
  • 48
user2988332
  • 115
  • 1
  • 9
  • 1
    If you have memory leaks it's because the garbage collector doesn't know to garbage-collect some objects, not because the garbage collector isn't running enough. If you force the garbage collector to run, it's still not going to garbage-collect those objects. The only way to fix memory leaks is to fix the cause, or kill the process and restart it. – Jordan Running Aug 09 '15 at 17:19
  • 1
    It's not really memory leaks, the crawler has sometimes memory peaks because of treatement of data, but then, the memory never goes down again. – user2988332 Aug 09 '15 at 17:22
  • To my knowledge, you cannot force garbage collection in javascript. What @Jordan said is true. You'll need to fix your code. Please see this article (helping someone fix their code because garbage collecting): http://stackoverflow.com/questions/3034179/javascript-force-gc-collection-forcefully-free-object – dannypaz Aug 09 '15 at 17:29
  • There are lots of things that could cause this and it is unlikely we can help without seeing your entire code to look for suspects. It is also complicated to sort of what is causing a real leak vs. what is causing a temporary memory usage that will actually be reused later. As others have said, as long as you are using async I/O, it is unlikely that calling the GC manually is a solution to anything. – jfriend00 Aug 09 '15 at 17:31
  • @dannypaz Thanks, I've already saw that one, but not sure it answer the question. The problem is that the crawler is running an "infinite loop", and hence the garbage never gets collected. If you have any ideas on how to code on nodejs an infinite running task/loop. I'll take it ;) – user2988332 Aug 09 '15 at 17:33
  • "The memory never goes down again" is the very definition of a memory leak. GC cannot and will not help with that. – Jordan Running Aug 09 '15 at 17:35
  • Here's a good article (well, series of articles) that discusses tools you can use to find and eliminate memory leaks in Node.js: http://www.willvillanueva.com/the-node-js-profiling-guide-that-hasnt-existed-profiling-node-js-applications-part-1/ – Jordan Running Aug 09 '15 at 17:47
  • @Jordan, does it behave as a memory leak just because it actually never ends ? – user2988332 Aug 09 '15 at 18:20
  • If your program's memory use never stops growing, and you, the designer of the program, don't know why (i.e. you did not design it to do that), then it's a memory leak. Whether or not the program ever ends is immaterial. (And there are plenty of long-running running programs out there with memory leaks that never get fixed—some developers are perfectly satisfied just restarting the program once a month or day or hour or whathaveyou.) – Jordan Running Aug 09 '15 at 18:28

0 Answers0