2

I'm trying to extract inline javascript that is uniquely different on thousands of URLs, and is nested within the code at various levels.

As I familiarize myself with XPATH syntax I am trying to see if anyone knows a good way to target javascript For example:

<script type="text/javascript"> ...data_#...</script>
<script type="text/javascript"> ...data_#...</script>
<script type="text/javascript"> ...data_n...</script>
<script type="text/javascript"> ...data_#...</script>
<script type="text/javascript"> ...data_#...</script>

The only unique identfier within the <script>...data_n...</script> that I am attempting to extract is it contains:

var tabsRelated = ...

Within the confines of XPATH does anyone know a way to find the script that contains that variable and target the entire script? Sorta like:

//script[inner.text contains='var tabsRelated'

syntax is not proper

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
  • possible duplicate of [xpath to get Node containing text](http://stackoverflow.com/questions/6442430/xpath-to-get-node-containing-text) – Marc B Nov 07 '11 at 19:30
  • The question I am asking refers to a more complex problem. In the cited discussion text() seems only to apply to HTML elements. I am unable to use this to isolate the above mentioned inline javascript. –  Nov 07 '11 at 20:09
  • 1
    XPath has no concept of javascript. It's just plain text as far as string searching is concerned. Find JS nodes, and check if their textvalue contains the string you want. – Marc B Nov 07 '11 at 20:10

1 Answers1

5

Use:

//script[contains(., $someDistinguishingValue)]

where $someDistinguishingValue should be replaced with the corresponding value (for example the above XPath expression may be dynamically generated as a string and then this string evaluated as an XPath expression using the available XPath API (such as the DOM method SelectNodes() ).

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431