1

Assuming I retrieved HTML content from a website (over which I have no control), and that content contains lots of Javascript code that's a significant part of what's actually rendered by a layout engine (e.g. WebView).

Is there a way I can render it myself?

For example, in the extreme case, suppose I am visiting a website that has almost nothing in its but displays very rich TEXT content, via a host of Javascript functions (which obviously results in HTML).

How do access/read that HTML result?

I am looking to do this on Android only.

Update, trying to provide more context to @abesto. If you go to facebook.com and copy/paste rendered content into a text file, you'll receive:

Facebook logo
Email   Password

Keep me logged in   Forgot your password?
Facebook helps you connect and share with the people in your life.
Sign Up
It's free and always will be.
First Name: 
Last Name:  
Your Email: 
Re-enter Email: 
New Password:   
I am:   
Birthday:   

Why do I need to provide this?
Security Check
This field is required.
Enter both words below, separated by a space.
Can't read the words below?Try different words or an audio captcha.
Please enter the words or numbers you hear.
Try different words or back to text.
Loading...
Text in the box:
What's this?

Back
Registering…
An error occurred. Please try again.

By clicking Sign Up, you are indicating that you have read and agree to the Terms of Use and Privacy Policy.
Create a Page for a celebrity, band or business.

    * Română
    * English (US)
    * Español
    * Português (Brasil)
    * Français (France)
    * Deutsch
    * Italiano
    * العربية
    * हिन्दी
    * 中文(简体)
    * »

Facebook © 2011 · English (US)
Mobile · Find Friends · Badges · People · Pages · About · Advertising · Developers · Careers · Privacy · Terms · Help

But if you look at the actual source (what you get in HttpResponse) you'll see much more monstrous text... mostly javascript.

I am only interested in the result of that Javascript. Any ideas how to accomplish this?

JohnK
  • 553
  • 1
  • 4
  • 8
  • Could you be more specific? What do you mean "render it myself?" – Matt Ball Mar 10 '11 at 22:10
  • Why is simply including all the page (including JS) not an option? – abesto Mar 10 '11 at 22:19
  • @Matt Ball: By "render it myself" I mean come up with "more or less" what WebView comes up. Unfortunately, only the screen has access to WebView's output... and I need to do some analysis on the page content. Any ideas? – JohnK Mar 10 '11 at 22:25
  • @abesto: Take the extreme case I described above, for example: There is nothing between and , but when you load the page on a web browser, its Javascript reveal a whole new world of HTML to you... (your eyes only). I want my text analysis program to "see" that HTML, too. Any ideas? – JohnK Mar 10 '11 at 22:28
  • 1
    I'll risk my eternal soul here for a minimum-effort solution: iframe? Also, you could pull all the – abesto Mar 10 '11 at 22:32
  • 1
    It sounds like you want a headless browser that runs on iOS. You don't care about showing the actual content, you just want programmatic access to the DOM. Right? – Matt Ball Mar 10 '11 at 22:38
  • @abesto: Your suggestions look closer than others' to what I am looking for. Can you describe this in more detail? – JohnK Mar 11 '11 at 04:31
  • @Matt Ball: Yes, you are absolutely right (except that I want this for Android, not iOS :) – JohnK Mar 11 '11 at 04:32
  • Just woke up and re-read the question. My ideas depend on your app running in a browser, which likely isn't the case. Sorry. – abesto Mar 11 '11 at 06:04

2 Answers2

1

I think the answer is yes, but don't do that.

If I had to implement a solution for translating 'Facebook' to a mobile phone, I could set up a server, maybe on Amazon EC2 and run the browser there, using a browser automation solution, such as Watir to simulate the clicks and scrape the data off the page. I think it's too much to hope for that you could run that efficiently behind the scenes on the phone itself.

However, the better solution might be to use Firebug/Fiddler etc to reverse engineer the ajax calls being sent and find a way to get the underlying data? Or maybe you just need to reverse-engineer the JS :(.

JoshRivers
  • 9,920
  • 8
  • 39
  • 39
  • JoshRivers: I understand your point, but I still want an engine running LOCALLY on my Android smartphone. There already is such an engine on Android (WebView) but for some strange reason, Google doesn't provide access to the HTML it actually displays. So I am looking for additional ways or trick (or a hack to actually make WebView's output available to me, in a String form). Any ideas? – JohnK Mar 11 '11 at 04:29
  • 1
    I started looking at hacking the WebView...but then I thought 'perhaps it's easier to add some JavaScript and have it do the work for me?' That led me to this: http://lexandera.com/2009/01/extracting-html-from-a-webview/ ....I think it's your solution? – JoshRivers Mar 11 '11 at 17:06
  • JoshRivers: Injecting (to the downloaded copy) some Javascript that will do the work is a GREAT IDEA. The link you provided shows how to get the raw HttpResponse (i.e. Javascript functions in source form). Is there a way to inject a script that would render the entire page? If you find such a way, your answer will be the accepted one. :) I am already giving you +1x2... Thank you! – JohnK Mar 11 '11 at 19:11
  • So what I'm thinking is this: you can already load up the WebView with your content page, and you have a method for extracting data from the WebView using javascript. So you load the page you want to scrape, let it render out the content you're looking for), then find the html element with the rendered content and extract it's innards. The javascript: could be generated using a tool like http://benalman.com/projects/run-jquery-code-bookmarklet/. – JoshRivers Mar 11 '11 at 20:42
  • 1
    I think the WebView onNewPicture event is what you'll need to monitor to find when the page has completed... http://developer.android.com/reference/android/webkit/WebView.PictureListener.html#onNewPicture(android.webkit.WebView, android.graphics.Picture) – JoshRivers Mar 11 '11 at 20:43
  • @JoshRivers onNewPicture indeed signals when the page has completed but it only provides the raw Javascript... BACK TO SQUARE ONE. There must be a way to "translate" that Javascript to HTML. What is it? – JohnK Mar 14 '11 at 12:39
0

It sounds like you want something like this :

http://jsconsole.com/

You basically load the url and mess with it. You just need to hook something into it to do it programmatically.

Take a look at their remote debugging explanation.

Since it's hooked upto to your android over a stream you can use any old PC technology you want to sniff the HTML.

Raynos
  • 166,823
  • 56
  • 351
  • 396
  • Thanks but this is really what I want. What I want is an engine running LOCALLY on my Android smartphone. There already is such an engine on Android (WebView) but for some strange reason, Google doesn't provide access to the HTML it actually displays. So I am looking for additional ways or trick (or a hack to actually make WebView's output available to me, in a String form). – JohnK Mar 11 '11 at 04:27
  • I meant "but this ISN'T really what I want." Sorry. – JohnK Mar 11 '11 at 04:36
  • @JohnK Write a new browser for android. – Raynos Mar 11 '11 at 09:07
  • If @JoshRivers's idea doesn't turn out to be the solution, then you're right: I will have no choice but to write a new browser. I was just hoping to avoid that. :) – JohnK Mar 11 '11 at 19:14