JavaScript to Html

Question

Given a web page with JavaScript code, I would like to generate a resulting html automatically (either via CLI tool OR using some library in some language)

For example, given test.html

<!DOCTYPE html>
<html>
  <body>
    <p id="demo"></p>
    <script>
      document.getElementById("demo").innerHTML = "Hello JavaScript!";
    </script>
  </body>
</html>

I would like to get as a result

<html>
  <body>
    <p id="demo">Hello JavaScript!</p>
    <script>
      document.getElementById("demo").innerHTML = "Hello JavaScript!";
    </script> 
  </body>
</html>

Could you serve the page on your localhost then scrape it with phantom? — Daniel Lizik, Nov 19 '15 at 17:17
I think @Tim is trying to parse a html file and insert text in the
tag. BeautifulSoup should do the work. — Yu Wu, Nov 19 '15 at 17:22
How about using a WebClient and getting the resulting dom from it? I imagine you could do this but haven't tested it, hence the comment instead of an answer. — Darren Gourley, Nov 19 '15 at 17:25
@Daniel_L The page is served from the host I have no control — Timofey, Nov 19 '15 at 18:13
@DarrenGourley are you speaking about htmlunit http://htmlunit.sourceforge.net/? If so then it brakes during the processing of my page unfortunately — Timofey, Nov 19 '15 at 18:14
@YuWu not exactly. The JavaScript might invoke some external url and modify the current page's DOM. A given example is simple and serves for demo purposes only. — Timofey, Nov 19 '15 at 18:18
@AdamBuchananSmith thanks, will check their APIs http://doc.jsfiddle.net/api/ — Timofey, Nov 19 '15 at 18:19
Sorry @Tim, I may have gotten the wrong end of the stick with your question, for some reason I assumed it had a .NET tag. WebClient is baked in to .NET and allows you to request Web pages as you would in a browser, only programmatically. — Darren Gourley, Nov 19 '15 at 18:22
You're misunderstanding @Daniel_L's comment, which is actually a correct answer (almost). Just run a phantom program which loads the page (from the host), then grab the full page content. — , Nov 19 '15 at 19:13
@torazaburo the comment was modified. I will definitely try http://phantomjs.org/screen-capture.html thanks — Timofey, Nov 19 '15 at 19:17
Screen capture is close to what you want, but not exactly. You'll want to grab the entire page HTML and save it to file somewhere most likely. — , Nov 19 '15 at 19:22
@torazaburo do you have perhaps an example? If you feel you have right solution, you can write an answer and get it accepted :) — Timofey, Nov 19 '15 at 19:24

score 0 · Answer 1 · answered Nov 19 '15 at 17:32

0

After a quick search, it looks like watin will do what you want.

It's aimed at automated testing, but when it hits a page it will execute all js as well as ajax calls etc. Looks like you can grab the resulting html from it too.

answered Nov 19 '15 at 17:32

Darren Gourley

1,798
11
11

Thanks, I will check this soon. Are you familiar with some other library for some open source stack (java, scala, clojure)? – Timofey Nov 19 '15 at 18:16
Sorry @Tim, as per my comment above I thought this question had a .NET tag. I'm not aware of any other libraries that would do this. – Darren Gourley Nov 19 '15 at 18:24

score 0 · Accepted Answer · answered Nov 22 '15 at 05:35

The answer is based on the comment of @torazaburo

In fact, the phantomjs is capable of evaluating javascript and producing html.

Here is how it could look like, executing phantomjs load_page.js path_to/test.html

var page = require('webpage').create(),
    system = require('system'),
    page_address;
var fs = require('fs');
if (system.args.length === 1){
  console.log('Usage: phantomjs ' + system.args[0] + ' <page_to_load:http://www.google.com>');
  phantom.exit();
}
page_address = system.args[1]

page.open(page_address, function(status){
    console.log('Status:' + status);
    if (status === 'success' ){
      fs.write('phantom_result.html', page.content, 'w')
    }
    phantom.exit();
});

JavaScript to Html

2 Answers2