1

I am trying to scrape some data off an enterprise website that was made using JSF and IceFaces. I am using C# and the RestSharp library.

I have no experience with JSP , JSF or IceFaces at all, so I am just trying to figure out how to replicate what the site is doing using HTTP requests but I wasn't very successful. The site does not have any concept of routing whatsoever (and when you accidentally happen to press the back button in the browser, you are logged out...).

What I have managed to do so far:

  • Make a POST request with credentials to /login resource in order to log in
  • Retrieve JSESSIONID Cookie after login and store it to my CookieContainer
  • Use Regexes to get the ice.session and ice.view values
  • Replicate a POST request to the block/send-receive-updates

If the original POST request is managed by the JS code on the site (When I am clicking around), it returns an XML response like this:

<updates>
    <update address="some form id" tag="table"> ... </update>
    <update address="content" tag="div"> ... </update>
    <update address="The ice.session id, the ice.view number separated with : followed by the string 'dynamic-code'" tag="script"> ... </update>
</updates>

However, I took all the encoded POST params that this request is doing on the site and replicated them in my C# code and my response only has the last update (script tag) like this:

<updates>
    <update address="The ice.session id, the ice.view number separated with : followed by the string 'dynamic-code'" tag="script"> ... </update>
</updates>

Does anyone please have experience with scraping/testing these technologies and can help me figure out what am I doing wrong ?

Thanks.

valorl
  • 1,499
  • 2
  • 14
  • 30
  • Is this helpful? http://stackoverflow.com/q/12175763 – BalusC Jun 15 '16 at 09:33
  • Yeah I have read something similar before, but it's not that helpful. I am replicating the exact same parameters that I can see the site do when clicking on buttons. There's no `javax.faces.ViewState` passed, only `ice.view` and the initial page also doesn't contain `javax.faces.ViewState` so I think maybe that part is done differently with IceFaces (probably with `ice.view`) – valorl Jun 15 '16 at 09:47
  • Older ICEfaces versions have indeed a proprietary ajax engine instead of a standard JSF one. Well, I've never really used ICEfaces myself, so it's only guessing. I know they have kind of decent "JavaScript disabled" fallback, so try simulating/tracking that instead of focusing on ajax request-response payload in a regular webbrowser which you won't ever properly simulate with merely a HTTP client (you basically need a HTML/JS client for that). – BalusC Jun 15 '16 at 09:53
  • It says `Javascript is blocked. ICEfaces cannot run.`. By HTML client, you mean that I would have to use something like PhantomJS or Selenium or things like that, that actually run a browser? – valorl Jun 15 '16 at 09:59

0 Answers0