2

I want to load external content (from another domain) and simulate navigation, doing things like programatically clicks and fill forms, probably using JQuery.

Explaining a little better: I need to navigate "automatically" through 3 pages, the first one is login area, where I'm supposed to fill login/pass fields, and submit. In the last one, I must fill some input fields, submit again, and get all html data from a report.

I was trying using a IFRAME and Jquery's contents(), then I realized that I cannot do that due obvious XSS security issues. (http://jsfiddle.net/TbMyx/4/).

Before trying this way (client-side, js, Iframe, etc), I also tried using Java. Sending POST/GET requisitions into a Servlet, and I didn't got any sucess on that either.

Any thoughts on that? At least, it's possible task? I'm a little negative on that, I don't think this is really possible, based in my current knowledge, I just need some confirmation

Rob Levine
  • 40,328
  • 13
  • 85
  • 111
Marcelo Assis
  • 5,136
  • 3
  • 33
  • 54
  • 2
    I'd use curl. To, on the backend, load the page and then pipe that over to the frontend. Most languages have curl or an equivalent (often just called request or http, or similar). [This post](http://stackoverflow.com/questions/2586975/how-to-use-curl-in-java) may help. – Marshall Feb 28 '12 at 17:58
  • @Marshall, thanks! Your suggestion was the first thing I tried. But I found it very hard to get sessions cookies, find hrefs and etc. That's because I thought in using JS. Now I'm researching easier ways to do that. – Marcelo Assis Feb 28 '12 at 18:37

5 Answers5

2

Yes, it is possible. Its called Web Scraping, and is fairly common.

As you have learnt, it is not possible to do this on the client side using javascript due to security restrictions.

On the server side, you have two options. a) Load up an actual browser and navigate the website just like a user would, or b) Use a headless browser, which is basically a library that simulates a real browser.

Using a Headless Browser In general, this is a faster and easier approach, but it may not work for complex websites that depend on javascript.

For java, HTMLUnit is a great library. Keep the fiddler request/response from your browser handy, because its possible the browser sends cookies or headers that are different from HtmlUnit. In general, if you match all the headers that the browser is sending, the website will respond correctly.

Using an Actual Browser Use this only if your attempts with a headless browser fail. This approach brings up a browser and navigates the website just like a user does.

You can use Selenium/WebDriver for this purpose. Be warned that running a browser in a server environment is actually resource expensive, and takes more time.

Sripathi Krishnan
  • 30,948
  • 4
  • 76
  • 83
  • As that "mission" was aborted by my manager, I'll not do that so soon. But now I really got interested in that. Really thanks! – Marcelo Assis Mar 01 '12 at 13:35
1

No it is not possible with JavaScript unless you reduce your security setting down to the levels of PLEASE HACK MY BANK ACCOUNT level.

You can act like a browser on the server as long as they do not call you out for being a bot. Hence why your posting with JAVA probably failed or you are not sending the right cookie/session info with the posts. Get Fiddler, and monitor the traffic, and try to recreate it.

epascarello
  • 204,599
  • 20
  • 195
  • 236
0

I've used Selenium for interacting with web forms in Java.

Once you get it set up, it's very easy to launch a browser and plug in values into the various input boxes and click buttons in an automated fashion.

Zack Macomber
  • 6,682
  • 14
  • 57
  • 104
  • I need this for releasing a software, I just need to show that data from 3rd page, to the user. – Marcelo Assis Feb 28 '12 at 18:02
  • @MarceloAssis - you can extract the page source also using Selenium and do with it as you wish. I used it to get the contents of a page after I had navigated to it in an automated fashion for unit test purposes. – Zack Macomber Feb 28 '12 at 18:04
0

I know that's now what you exactly want, but give selenium a try.

Amir Pashazadeh
  • 7,170
  • 3
  • 39
  • 69
0

Yes, you can't.

Navigation is possible by changing the url of the external content (opened windows, frames).

Fill forms might be possible if you duplicate the (static) forms on your page, and post them to the other domain (maybe targeting a hidden iframe).

But you never will get access to the contents of that other page, may it be "html data from a report" or dom elements to "programatically click".

Bergi
  • 630,263
  • 148
  • 957
  • 1,375