11

I am scraping a website that uses Javascript to dynamically populate the content of a website with the Html Agility pack.

Basically, I was searching for the XPATH "\\div[@class='PricingInfo']", but that div node was being written to the DOM via Javascript.

So, when I load the page through the Html Agility pack the XPATH mentioned above cannot be found.

It turns out there is a comment before a particular script block I want to parse.

<!--Module 328 Buying Options Table-->
<script type="text/javascript" language="JavaScript">
    var data = {
        price: 30.00
    }
</script>

For this site, there are many script blocks and so I would need to narrow it down by the finding this auto-generated comment <!--Module 328 Buying Options Table--> and the sibling of that node would be the correct script block.

Any idea on how I can search for a particular comment and then just get the adjacent script block?

Thank you!

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
Abe
  • 6,386
  • 12
  • 46
  • 75

1 Answers1

16
htmlDoc.DocumentNode.SelectSingleNode("//comment()[contains(., 'Buying Options')]/following-sibling::script")
Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • worked like a charm. thank you! Now, I need a way to parse out the Javascript object. – Abe Oct 02 '10 at 03:32
  • Just to add one more thing. Once I got the script note, I was able to parse out the information I needed by using regular expressions. thanks! – Abe Oct 02 '10 at 08:28