How to Upload PhantomJS Page Content to S3

Question

I am using PhantomJS 1.9.7 to scrape a web page. I need to send the returned page content to S3. I am currently using the filesystem module included with PhantomJS to save to the local file system and using a php script to scan the directory and ship the files off to S3. I would like to completely bypass the local filesystem and send the files directly from PhantomJS to S3. I could not find a direct way to do this within PhantomJS.

I toyed with the idea of using the child_process module and pass in the content as an argument, like so:

var execFile = require("child_process").execFile;
var page = require('webpage').create();
var content  = page.content;

execFile('php', '[path/to/script.php, content]', null, function(err,stdout,stdin){
   console.log("execFileSTDOUT:", JSON.stringify(stdout));
   console.log("execFileSTDERR:", JSON.stringify(stderr));
});

which would call a php script directly to accomplish the upload. This will require using an additional process to call a CLI command. I am not comfortable with having another asynchronous process running. What I am looking for is a way to send the content directly to S3 from the PhantomJS script similar to what the filesystem module does with the local filesystem.

Any ideas as to how to accomplish this would be appreciated. Thanks!

score 1 · Answer 1 · edited May 23 '17 at 12:31

1

You could just create and open another page and point it to your S3 service. Amazon S3 has a REST API and a SOAP API and REST seems easier.

For SOAP you will have to manually build the request. The only problem might be the wrong content-type. Though it looks as if it was implemented, but I cannot find a reference in the documentation.

You could also create a form in the page context and send the file that way.

edited May 23 '17 at 12:31

Community

1
1

answered May 02 '14 at 17:40

Artjom B.

61,146
24
125
222

Great ideas! I did not yet successfully implement them but it gives me some thing to try. I also explored the [Javascript-sdk](https://aws.amazon.com/sdkforbrowser/) which is a dedicated sdk for sending files via the browser. So far I am having some trouble with the Secret Access Key not matching up. Thanks again! – AYTWebSolutions May 06 '14 at 13:27

How to Upload PhantomJS Page Content to S3

1 Answers1