1

I'm using Celerity in JRuby to automate the download of some .csv files from certain websites. For one of the websites (LinkShare), I've gotten very close but cannot figure out the last step.

The website pushes the file download using javascript and the 'hidden iframe' method - during regular browsing, when you click the download button, it calls javascript that creates a hidden iframe containing the download content, and the browser picks that up and prompts the user to save the file.

Obviously doesn't work quite the same way in Celerity. I can see the new iframe in jirb after I've clicked the link, but can't call any methods on it, getting errors like:

NoMethodError: undefined method `getDocumentElement' for #<Java::ComGargoylesoftwareHtmlunit::TextPage:0x184e6efc>

Anybody have enough experience with Celerity/Htmlunit/Javascript/Jruby that they can point me in the right direction? I just want to retrieve the download content (the .csv file).

Alternately, does anybody know of a (headless) browser automation tool that would be better suited for the task, if one exists?

Aaron
  • 188
  • 2
  • 8

3 Answers3

0

Mechanize may work for you, it's meant to more closely resemble a normal person's usage of a browser, while remaining headless.

http://mechanize.rubyforge.org/

ehsanul
  • 7,737
  • 7
  • 33
  • 43
0

As ehsanul said Mechanize might be a good starting point. You'll need to figure out the URL being accessed to retrieve the file. Also, look for a cookie or session ID identifying your session to the host. Mechanize should capture that and return it as that's part of what it does.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
0

The first thing I'd do is check that you're navigating to the frame. A frame (even an iframe) is treated as a completely separate window, and you'll have to navigate there first. Check the Celerity::Frames class.

Failing that, you may want to try a library that controls a browser, rather than emulate it. Libraries that emulate a browser (such as htmlunit and mechanize) have their limits, and you may have found one. For this, I'd recommend using watir/firewatir.

Mark Thomas
  • 37,131
  • 11
  • 74
  • 101
  • I ended up doing something basically like this. I had to do some regex magic on the html to put together a url to follow. – Aaron Nov 04 '10 at 21:24