0

I'm writing a web-crawler using Chickenfoot and need to save PDF files. I can either click the link on the page or grab the PDF's URL and use

go("http://www.whatever.com/file.pdf") 

and I get the firefox "Opening file.pdf" dialog box, but can't click the "OK" button to actually save the file.

I've tried using other means to download the files (wget, python's urllib2, twill), but the PDF files are gated so none of those will work.

Any help is appreciated.

alaiacano
  • 693
  • 1
  • 5
  • 10

3 Answers3

1

This example of how to save a target in the Mozilla developer documents looks like it should do exactly what you want. I've tested a Chickenfoot example that is very similar that gets the temp environment variable, and that worked well for me in Chickenfoot.

https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebBrowserPersist#Example

You might have to play with the application associations in Tools, Options, Applications to make sure the action is set to Save File, but those settings might not apply to these functions.

End Answer, begin related grumblings...

I sure wish someone would fix the many bugs in Chickenfoot, and write a nice Cookbook programming guide. I've been using it for years, and there are still many basic things I've not been able to figure out how to do. I finally broke down and subscribed to the mailing list, as the archives have some decent script examples. It takes a lot of searching through the pdf references, blogs, etc. as the web API reference is very sparse. I love how simple Chickenfoot can make automating some tasks, but it takes me days of searching javascript, DOM, and Firefox documents to find ways to do some of the things it can't, since I'm not really a web programmer. The goal of Chickenfoot seems to be that I shouldn't have to be, but unfortunately few are refining the proof of concept, as MIT has dropped the project.

I tried to do this several ways using only Chickenfoot commands and confirmed they don't work with the latest Firefox 3 and Chickenfoot 1.0.7.

I hope this helps! Good luck. Sorry I only ran across your question yesterday, but found it too interesting to leave alone.

MSZ
  • 80
  • 4
0

This has worked for me to save Excel files from NCES portal.

http://muaz-khan.blogspot.com/2012/10/save-files-on-disk-using-javascript-or.html

I was using Firefox 3.0 and the "old syntax" version of the code. I also stripped code intended for IE and "(window.URL || window.webkitURL).revokeObjectURL(save.href);" which generated an error.

myudelson
  • 26
  • 2
0

You won't be able to click on Firefox dialogs for the sake of security. The best way to download the content of a URL is to read then write the content of the URL.

// Chickenfoot 1.0.7 Javascript Code to download the content of a url.
include( "fileio.js" ); // enables the write function.
var url = "http://google.com", 
    saveFileTo = "c://chickenfoot-google.com";

write( saveFileTo, read( url ) ); 

You might find it helpful to use jquery with chickenfoot. http://groups.csail.mit.edu/uid/chickenfoot/scripts/index.php?title=Using_jQuery,_jQuery_UI_and_similar_libraries

Larry Battle
  • 9,008
  • 4
  • 41
  • 55
  • Larry, That seems to work fine for saving the source of a webpage, but seems to corrupt PDF files or any other binary file I've tried. I read up a little on jQuery but haven't come across a solution to this problem there either. – alaiacano Dec 08 '10 at 15:46