2

I am trying to select an script tag on page with text contains

Document doc=jsoup.parse(somehtml);
Elements ele=doc.select("script:contains(accountIndex)");

Code for script tag on the page is

<script>(function() {var vm = ko.mapping.fromJS({
"accountIndex": 1,
"accountNumber": "*******",
"hideMoreDetailsText": "Hide More Details",
"viewAccountNumberText": "Show Account Number",
"hideAccountNumberText": "Hide Account Number",
 });window.AccountDetails = vm;})();</script>

I am able to select this script tag if i pass css locator of script tag like

  Elements ele=doc.select("body > script:nth-child(44)");

There are many script tag on the page so the second approach is not generic.It may change in future.

Can somebody please tell what is the issue with the first approach.Because i am able to select other tags on the page with contains of jsoup

Ravi
  • 719
  • 8
  • 23

2 Answers2

6

The selector :contains(text) looks for an element that has that text value. A script doesn't have text, it has data (otherwise the JS would be visible in the browser). You can use the :containsData(data) selector instead.

E.g.:

Elements els = doc.select("script:containsData(accountIndex)");

Here's an example. The Selector documentation has all the handled query types (which is not just strict CSS).

Jonathan Hedley
  • 10,442
  • 3
  • 36
  • 47
  • It's working! Thanks a lot! A script doesn't have text, it has data it really has clear explanation why that was not working. – Ravi Jan 23 '17 at 10:42
  • 1 more question many hidden tags are also parsed by jsoup.although they are not displayed in browser then why jsoup treats their content as text and not treats script tag content as text. – Ravi Jan 23 '17 at 10:44
  • Like what, for example? – Jonathan Hedley Jan 23 '17 at 15:42
  • This is giving error in jsoup 1.7.2 but working fine in 1.10.2.Is that expected behaviour – Ravi Jan 23 '17 at 19:33
  • Yes. 1.7.2 is four years old. – Jonathan Hedley Jan 23 '17 at 20:55
  • Is there any way to get it done on jsoup 1.7.2. As i have version 1.7.2 in my production environment. – Ravi Jan 24 '17 at 05:14
  • You could select all the script elements, then iterate and check their contents via getWholeData(). But why not just upgrade? Current version runs faster and leaner and is more convenient than 1.7.2. – Jonathan Hedley Jan 24 '17 at 17:03
1

jsoup only supports CSS selectors, and those only allow you to select based on CSS classes and properties of the DOM elements, not their text contents (CSS selector based on element text?). You could try using another framework for parsing and querying the HTML, for example XOM and TagSoup like described here: https://stackoverflow.com/a/11817487/7433999

Or you could add CSS classes to youc script tags like this:

<script class="class1">
// script1
</script>
<script class="class2">
// script2
</script>

Then you can select the script tags again via CSS using jsoup:

Elements elements = document.select("script.class1");
Community
  • 1
  • 1
ralph.mayr
  • 1,320
  • 8
  • 12
  • Jsoup allows selection based on text contains as i am able to select other elements via same. – Ravi Jan 21 '17 at 18:38