0

I've had a search and come up with Rhino and Jaxer is possible solutions, but wanted to put the queston out there anyway as I'm not sure they're quite what I'm after (especially if I have no control over the javascript, so I'm unable to add runat="server" for example).

So, I want to call a remote page on a 3rd party site, from my server, and have the javascript executed.

Using CUrL, I can easily grab the page, it's content, do POSTing etc, etc, but what I can't do, is run javascript.

I've had solutions suggested from building a .NET application which calls the URLs in a browser, to the above (Rhino and Jaxer) but I wanted to see if anyone had any previous experience of this and if so, what are the possible gotchas and how did you solve the problem.

Cheers,

Mike

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Mike
  • 695
  • 2
  • 10
  • 23
  • javascript is a client language.it doesn't run at server – Srinivas Reddy Thatiparthy Jun 01 '10 at 12:07
  • 2
    @Srinivas: No. JavaScript is a language, not a *client-side* language. I use it on the server all the time. In fact, it was one of the first server-side *scripting* languages (the old Netscape Application Server supported it circa 1997). Every version of IIS has supported it for server-side scripting in ASP and ASP.Net pages. Etc. – T.J. Crowder Jun 01 '10 at 12:09
  • 1
    One of the potential problems you face, @Mike, is that the Javascript on those pages is likely to expect that it will have access to the page DOM, and that the DOM will look the way the code expects it to look. That means that your server-side code will have to parse the surrounding HTML and provide at least some portion of the DOM API for the Javascript to use. You can cross your fingers and hope that the HTML can be parsed with some server-side XML parser, but that might be a challenge. – Pointy Jun 01 '10 at 12:13
  • 2
    @Mike: I have *not* done this, but in order for the JavaScript in the remote page to run properly, it's going to need to think it's running client-side (in fact, you're basically going to be using your server as the client of that other site, if you follow me). That means it'll need to have a DOM to manipulate and such. That suggests you're going to have to run not just Rhino, but nearly a full-on browser. It's probably worth looking at the WebKit or Gecko engines for DOM, and SpiderMonkey (Firefox's JS engine) or V8 (Chrome's) for JavaScript. This is a non-trivial task. – T.J. Crowder Jun 01 '10 at 12:16
  • @Pointy, @T.J. Crowder: OP Says he has looked into Jaxer, which is Gecko-based and features not only SpiderMonkey, but complete DOM API of Firefox web browser. – pawel Jun 01 '10 at 14:58
  • 1
    @pawel well ok then, that sounds awesome and quite appropriate to the purpose! – Pointy Jun 01 '10 at 19:24
  • Brilliant, thanks guys. I can always count on a well-rounded response from SO. – Mike Jun 23 '10 at 14:24

1 Answers1

0

I think Jaxer is your only option. You can use Jaxer.Sandbox to render remote page on the server and execute all scripts embedded on that page. The resulting DOM is what you'd get in Firefox web browser with JS enabled. Here's a simple tutorial featuring Jaxer.Sandbox for web scraping purposes.

pawel
  • 35,827
  • 7
  • 56
  • 53