I threw together a small Lambda function together to crawl a website using the SpookyJS, CasperJS, and PhantomJS toolchain for headless browsing. The task is quite simple, and at some point a few months ago it was working in Lambda. I recently had to change a few things around and wanted to work on the project again, but started fresh and had trouble getting Lambda to run without erroring in any capacity. My question is how can I run phantomjs in Lambda?
The example code I am running is:
spooky.start('http://en.wikipedia.org/wiki/Spooky_the_Tuff_Little_Ghost');
spooky.then(function () {
this.emit('hello', 'Hello, from ' + this.evaluate(function () {
return document.title;
}));
});
spooky.run();
The error I am getting in Lambda is:
{ [Error: Child terminated with non-zero exit code 1] details: { code: 1, signal: null } }
I have followed a variety of procedures to ensure everything is able to run on Lambda. Below is a long list of things I've attempted to diagnose:
- Run locally using
node index.js
and confirm it is working - Upload package.json and the js file to an Amazon Linux EC2 instance for compilation as recommended for npm installation calls and described here
- Run
npm install
on the ec2 instance, and again runnode index.js
to ensure the correct output - zip everything up, and deploy to AWS using the cli
My package.json is:
{
"name": "lambda-spooky-test",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1"
},
"author": "",
"license": "ISC",
"dependencies": {
"casperjs": "^1.1.3",
"phantomjs-prebuilt": "^2.1.10",
"spooky": "^0.2.5"
}
}
I have also attempted the following (most also working locally, and on the AWS EC2 instance, but with the same error on Lambda:
- Trying the non -prebuilt version of phantom
- Ensuring casperjs and phantomjs are accessible from the path with
process.env['PATH'] = process.env['PATH'] + ':' + process.env['LAMBDA_TASK_ROOT'] + ':' + process.env['LAMBDA_TASK_ROOT'] + '/node_modules/.bin'; console.log( 'PATH: ' + process.env.PATH );
Inspecting spawn calls by wrapping child_process's
.spawn()
call, and got the following:{ '0': 'casperjs', '1': [ '/var/task/node_modules/spooky/lib/bootstrap.js', '--transport=http', '--command=casperjs', '--port=8081', '--spooky_lib=/var/task/node_modules/spooky/lib/../', '--spawnOptions=[object Object]' ], '2': {} }
Calling
.exec('casperjs')
and.exec('phantomjs --version')
directly, confirming it works locally and on EC2, but gets the following error in Lambda. The command:`require('child_process').exec('casperjs', (error, stdout, stderr) => { if (error) { console.error('error: ' + error); } console.log('out: ' + stdout); console.log('err: ' + stderr); });
both with the following result:
err: Error: Command failed: /bin/sh -c casperjs
module.js:327
throw err;
^
Error: Cannot find module '/var/task/node_modules/lib/phantomjs'
at Function.Module._resolveFilename (module.js:325:15)
at Function.Module._load (module.js:276:25)
at Module.require (module.js:353:17)
at require (internal/module.js:12:17)
at Object.<anonymous> (/var/task/node_modules/.bin/phantomjs:16:15)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
2016-08-07T15:36:37.349Z b9a1b509-5cb4-11e6-ae82-256a0a2817b9 sout:
2016-08-07T15:36:37.349Z b9a1b509-5cb4-11e6-ae82-256a0a2817b9 serr: module.js:327
throw err;
^
Error: Cannot find module '/var/task/node_modules/lib/phantomjs'
at Function.Module._resolveFilename (module.js:325:15)
at Function.Module._load (module.js:276:25)
at Module.require (module.js:353:17)
at require (internal/module.js:12:17)
at Object.<anonymous> (/var/task/node_modules/.bin/phantomjs:16:15)
at Module._compile (module.js:409:26)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)