0

I am scraping some data form website.

HTML is like this:

<!-- header section -->

<html>
<head>

    <table class="inputsection" width="100%" border="0" cellspacing="0" cellpadding="0">
        <tr valign="top">
            <td width="70%">


<script type="text/javascript">
    var marketInfos = new Array();

         marketInfos[0] =
            createMarket('03/04 Annual Auction','1','Cleared');

Need to retrieve array marketInfos which has around 800 entries.

Tried using HTMLAgilityPack but it won't return the script data that I am looking for. Here is the actual html: Actual HTML

I tried to print innertext/html of all script nodes but the one that I am looking for is missing.

                HtmlDocument doc = new HtmlDocument();
                doc.LoadHtml(response);
                foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//table"))
                {
                    foreach (var att in node.Attributes)
                    {
                        if (att.Name == "class" && att.Value == "inputsection")
                        {
                            Debug.WriteLine(node.InnerHtml);
                            Debug.WriteLine("+++++++++++++");
                        }
                    }
                }

Is there a simple way to parse HTML to retrieve Javascript array variable to C# array ?

Cannon
  • 2,725
  • 10
  • 45
  • 86
  • 1
    Has the javascript dependency on the HTML? Because if it have, I think a good option will be using a browser automation framework like Selenium. If not, you can replicate the logic inside `createMarket` function and parse the argument or extract the entire javascript and run in another machine, like V8. – sergiogarciadev May 13 '14 at 15:27
  • I believe that [HTMLAgilityPack](http://htmlagilitypack.codeplex.com/) will do what you want. – crthompson May 13 '14 at 15:27
  • Tried but did not work. Updated with my code and actual html that I am trying to parse. – Cannon May 13 '14 at 16:21
  • First extract the script as recommended in the duplicate question then try to run it using an [embedded JavaScript engine](http://stackoverflow.com/q/172753/12892). – Cristian Ciupitu May 13 '14 at 18:53

0 Answers0