46

I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL.

I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxious javascript and flash based mechanism for login, rendering my methods useless.

I then tried HTMLUnit, but that doesn't seem to want to work either. I suspect use of Flash is the issue.

I don't really want to think about it any more, so I'm leaning towards scripting an actual browser to log in and grab the file I need.

Requirements are:

  • Run on linux server (ie. no X running). If I really need to have X I can make that happen, but I won't be happy.
  • Be reliable. I want to start this thing and never think about it again.
  • Be scriptable. Nothing too sophisticated, but I should be able to tell the browser the various steps to take and pages to visit.

Are there any good toolkits for a headless, X-less scriptable browser? Have you tried something like this and if so do you have any words of wisdom?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Parand
  • 102,950
  • 48
  • 151
  • 186

6 Answers6

40

What about phantomjs?

Phil
  • 2,239
  • 3
  • 25
  • 26
  • 1
    PhantomJS is by far the easiest to integrate and is developing ways to integrate with automated tests frameworks. – Edu Felipe May 04 '11 at 13:47
  • 2
    Yeah, but if you want to simulate something like logging into a website and using cookies, good luck! – Jay Taylor Jul 08 '11 at 16:10
  • 3
    Phantomjs requires an X-server (http://code.google.com/p/phantomjs/issues/detail?id=33) – Peter Prettenhofer Jul 26 '11 at 14:27
  • 1
    @pyrony [CasperJS](https://github.com/n1k0/casperjs) solves that problem. :) – John Doe Sep 26 '11 at 04:56
  • Still requires X11 to be installed even though issue 33 was closed. http://code.google.com/p/phantomjs/issues/detail?id=163 – Vizjerai Sep 28 '11 at 14:09
  • 6
    PhantomJS now supports persistent cookies, and does not require an X11 server (not sure when, but the latest 1.5.0 works fine on my headless linux server) – davr Apr 02 '12 at 21:39
  • PhantomJS does not support Flash, since version 1.5 http://phantomjs.org/release-1.5.html – Jakub M. Jun 17 '13 at 15:19
17

I did related task with IE embedded browser (although it was gui application with hidden browser component panel). Actually you can take any layout engine and cut output logic. Navigation is should be done via firing script-like events.

You can use Crowbar. It is headless version of firefox (Gecko engine). It turns browser into RESTful server that can accept requests ("fetch url"). So it parse html, represent it as DOM, wait defined delay for all script performed.

It works on linux. I suppose you can easily extend it for your goal using JS and rich XULrunner abilities.

Andy Gee
  • 3,149
  • 2
  • 29
  • 44
Dmitry
  • 1,004
  • 8
  • 9
9

Have you tried Selenium? It will allow you to record a usage scenario, using an extension for Firefox, which can later be played back using a number of different methods.

Edit: I just realized this was a very late response. :)

Community
  • 1
  • 1
nici
  • 191
  • 4
  • 5
6

Have a look at WebKitDriver. The project includes headless implementation of WebKit.

Michael Spector
  • 36,723
  • 6
  • 60
  • 88
1

I don't know how to do flash interactions (and am also interested), but for html/javascript you can use Chickenfoot.

And to get a headless + scriptable browser working on Linux you can use the Qt webkit library. Here is an example use.

hoju
  • 28,392
  • 37
  • 134
  • 178
0

To accomplish this, I just write Chrome extensions that post to CouchDBs (example and its Futon). Add the Couch to the permissions in the manifest to allow cross-domain XHRs.

(I arrived at this thread in search of a headless alternative to what I've been doing; having found this thread, I'm going to try Crowbar at some point.)

Also, considering the bizarre characteristics of this website, I can't help wondering whether you can exploit some security hole to get around the Flash and Javascript.

Avery Richardson
  • 277
  • 2
  • 11