1

I need to extract the information contained in Html and Javascript of a site . As for html I have succeeded in this by using the java library called jsoup , but now I would like to extrapolate content of a variable within the js files from the same site .

How can I do it ? Thanks in advance

Daniele
  • 704
  • 1
  • 6
  • 22
  • Unclear: do you want to extract the content of a javascript file? Do you want to parse the html page with dynamic content resulting von javascript processing? Without url and description of desired output the question is to general. Might be, that HtmlUnit can help you (headless java browser with limited js support). – Frederic Klein Sep 01 '16 at 10:37
  • I am creating an Android application and I need to extrapolate the value of a variable content in a javascript file to a remote site such as www.google.com , and I need to do it dynamically as the value of this variable changes every time – Daniele Sep 01 '16 at 11:13
  • Since it is Android, you can use a WebView. See related answer: http://stackoverflow.com/a/39174441/1661938 – Frederic Klein Sep 01 '16 at 11:19

1 Answers1

2

I would like to extrapolate content start of a variable within the js files from the same site

Try this:

// ** Exception handling removed ** //

Document doc = Jsoup.connect(websiteUrl).get();

String jsFilesCssQuery = "script[src]";
for(Element script : doc.select(jsFilesCssQuery) {
    // You may add further checks on the script element found here...
    // ...

    // Download JS code
    Connection.Response response = Jsoup //
      .connect(script.absUrl("src")) //
      .userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36") //
      .ignoreContentType(true) // To force Jsoup download the JS code
      .referrer(doc.location()) //
      .execute(); //

   String jsCode = new String( //
          response.bodyAsBytes(), //
          Charset.forName(response.charset()) //
   );

   // Do extraction on jsCode here...
   // ...
}
Stephan
  • 41,764
  • 65
  • 238
  • 329
  • 1
    Perfect, this works . Now as I can extrapolate the contents of a variable inside of this file ? – Daniele Sep 02 '16 at 11:01
  • String jsCode = new String( response.bodyAsBytes(), Charset.forName( response.charset() ) ); this code cause a crash of the app – Daniele Sep 02 '16 at 11:13
  • @danielecastronovo *"Now as I can extrapolate the contents of a variable inside of this file ?"* Sorry, I didn't understand this part of your comment. Concerning the crash of the app, try this instead: `String jsCode = new String( response.bodyAsBytes())`. – Stephan Sep 02 '16 at 12:31
  • within this JavaScript file , there is a variable of which I know the name , such as foo , but whose content start always varied , so how can I extract the contents of this variable ? Eg . Var foo = [ http://www.google.com ] I want to get in a dynamic wey the url http://www.google.com – Daniele Sep 02 '16 at 13:11
  • @danielecastronovo Please update your question with the variations you have found. – Stephan Sep 02 '16 at 13:34