2

I'm executing a phantomjs script about 15 000 times, over and over for different webpages. I'm using a queue system I created myself so I only run 2 processes at the time with the method execFile. (The queue how ever is generated/composed all at once, but the size of the queue, measured with sizeof, is only about 20mb)

After a while, my nodejs script starts to throw this message back at me:

{ [Error: Command failed: /home/ec2-user/bot/node_modules/phantomjs/lib/phantom/bin/phantomjs /home/ec2-user/bot/phantomjs-script.js http://www.example.com/foo/bar
]
  killed: true,
  code: null,
  signal: 'SIGTERM',
  cmd: '/home/ec2-user/bot/node_modules/phantomjs/lib/phantom/bin/phantomjs /home/ec2-user/bot/phantomjs-script.js http://www.example.com/foo/bar' }

If I then restart the script, thees messages are thrown from the beginning of the nodejs script.
If I log out and back in, then I can run the script longer before thees messages appear again, much like the first time running the script.

To me it sounds like a memory/heap/garbage collection issue. But can't seem to find any information regarding what to do about it, except different methods for determining how much memory is used. The conclusen is that the script is using a very high amount memory, between 20-500 mb, measured with memoryUsage

...which causes this error to be thrown in the end:

child_process.js:1155
    throw errnoException(err, 'spawn');
          ^
Error: spawn ENOMEM
    at exports._errnoException (util.js:746:11)
    at ChildProcess.spawn (child_process.js:1155:11)
    at exports.spawn (child_process.js:988:9)
    at Object.exports.execFile (child_process.js:682:15)
    at /home/ec2-user/bot/main.js:66:20
    at worker (/home/ec2-user/bot/main.js:20:9)
    at wrapper [as _onTimeout] (timers.js:265:14)
    at Timer.listOnTimeout (timers.js:110:15)

The error is explained in this question Node.js catch ENOMEM error thrown after spawn The solution here seems to be to expend your memory, but I really don't think I should need more then 500mb of memory to read 2 webpages at the time.

I realize my issue could be local, unfortunately I have no code to show you where I think the problem could be. To me, it all looks good - and the script is to big to dump here..

I'm pretty lost in this and been trying to figure it out for 2 days now with out any success, so I felt compiled to ask if anyone else has have problems with phanomjs or child processes the way I explained above, and if so, what can I do about it?

Community
  • 1
  • 1
superhero
  • 6,281
  • 11
  • 59
  • 91
  • Hello, Eric, have you found a way to solve or go around this issue? – Vaviloff Nov 30 '15 at 06:39
  • I can only offer you an incomplet understanding of the problem and a solution. Don't use child-processes to communicate between the two. I'm using docker and setting up phantomjs as a server http://phantomjs.org/api/webserver/method/listen.html to communicate through simple http protocol between the services. It works a lot better this way, though I still have issues with GC in some 3:d party nodejs modules for parsing and extracting the content. To solve this I use a synchronous child process for this part. I'm still working on a good architecture for the logic however.. – superhero Nov 30 '15 at 09:48
  • @Vaviloff Love to chat more with you if you running in to anything, or just wanna trade of some ideas on the matter. – superhero Nov 30 '15 at 09:50

0 Answers0