25

I am using the Jsoup library to read a URL. This url has text within a few <script> tags. Is it possible for me to obtain the text within each <script> tag? Please note that I am not asking to parse a Javascript file as I am already aware JSoup does not allow that. The actual source code of the URL has text within a script tag, I need that.

doc = Jsoup.connect("http://www.example.com").timeout(10000).get();

Element div = doc.select("script").first();
for (Element element : div.children()) {
System.out.println(element.toString());
}

This is what one of the script tags look like from the source code:

<script type="text/javascript">
(function() {
...
})();
</script>
M9A
  • 3,168
  • 14
  • 51
  • 79

4 Answers4

29

Alternatively, you could use the Element#html() method that returns the inner html of an element.

Since 1.11.1: Use efficient Element#selectFirst() method to find the script element.

Document doc = Jsoup.connect("http://www.example.com").timeout(10000).get();
Element scriptElement = doc.selectFirst("script");

// Don't forget to check scriptElement is not null...

String jsCode = scriptElement.html(); 

Up to Jsoup 1.10.3: Combine Element#select() and Elements#first() calls to find the script element.

Document doc = Jsoup.connect("http://www.example.com").timeout(10000).get();
Element scriptElement = doc.select("script").first();

// Don't forget to check scriptElement is not null...

String jsCode = scriptElement.html(); 
Stephan
  • 41,764
  • 65
  • 238
  • 329
28

Yes. You can use Element#getElementsByTag() to get all the script tag . Each script tags will be represented by the DataNode.

 Document doc =Jsoup.connect("http://stackoverflow.com/questions/16780517/java-obtain-text-within-script-tag-using-jsoup").timeout(10000).get();
 Elements scriptElements = doc.getElementsByTag("script");

 for (Element element :scriptElements ){                
        for (DataNode node : element.dataNodes()) {
            System.out.println(node.getWholeData());
        }
        System.out.println("-------------------");            
  }
Ken Chan
  • 84,777
  • 26
  • 143
  • 172
  • thank you @KenChan, It worked perfectly, i'm using `String scriptdata = node.getWholeData();` **But as only get the second script on the page?** – Florida Jun 16 '15 at 20:39
9
Document doc = Jsoup.parse(html);
Elements scripts = doc.getElementsByTag("script");
for (Element script : scripts) {
    System.out.println(script.data());
}
Mojtaba Yeganeh
  • 2,788
  • 1
  • 30
  • 49
  • 2
    Although this code may answer the question, providing additional context regarding _why_ and/or _how_ it answers the question would significantly improve its long-term value. Please [edit] your answer to add some explanation. – Toby Speight Apr 25 '16 at 13:47
3

According to your case the solution will be as below.

Document doc = Jsoup.connect("http://www.example.com").timeout(10000).get();
Elements scripts = doc.select("script");

for (Element script : scripts) {
    String type = script.attr("type");
    if (type.contentEquals("text/javascript")) {
        String scriptData = script.data(); // your text from the script
        break;
    }
}
mad_fox
  • 3,030
  • 5
  • 31
  • 43
  • You could simplify the code with "cssQuery" syntax as Elements scripts = doc.select("script[type=text/javascript]"); – Tanzer Oct 20 '22 at 19:51