0

I have the following HTML file, I want to get each H2 (Standard (Flexible Rate).. and Executive (Flexible Rate) ... with the Room Only , Breakfast Included .

Then push the Room Only and Breakfast Include with 2 Prices of each into objects where I have Standard along with 2 prices from Room Only and Breakfast Included, and the same for Executive

I tried Fizzler with AgilityPack however, I couldn't get the correct results, could you please suggest me an idea or one good parser for this case? Thanks

<div id="accordionResizer" style="padding:5px; height:300px; border-radius:6px;" class="ui-widget-content regestancias">
  <div id="accordion" class="dias">
    <h2>
      <a href="#">
        Standard (Flexible Rate) from 139 €
      </a>
    </h2>
    <div class="estancias_precios estancias_precios_new">
      <table style="width: 285px;">
        <tr class="" title="">
          <cont>
            <td style="width: 25px;">
              <input type="radio" name="estancias" id="tarifa602385" elem="tarifa" idelem="602" idreg="385" precio="139" reg="Only%20Bed" nombre="Standard%20%28Flexible%20Rate%29" />
            </td>
            <td style="width: 155px;">
              <label class="descrip" for="tarifa602385" precio="139.00" reg="Only%20Bed" nombre="Standard%20%28Flexible%20Rate%29">
                Only Bed
              </label>
            </td>
            <td style="width: 55px;"></td>
            <td style="width: 55px;">
              <strong class="precios_mos">139.00 €</strong>
            </td>
          </cont>
        </tr>
        <tr class="" title="">
          <cont>
            <td style="width: 25px;">
              <input type="radio" name="estancias" id="tarifa602386" elem="tarifa" idelem="602" idreg="386" precio="156.9" reg="Breakfast%20Included" nombre="Standard%20%28Flexible%20Rate%29" />
            </td>
            <td style="width: 155px;">
              <label class="descrip" for="tarifa602386" precio="156.90" reg="Breakfast%20Included" nombre="Standard%20%28Flexible%20Rate%29">
                Breakfast Included
              </label>
            </td>
            <td style="width: 55px;"></td>
            <td style="width: 55px;">
              <strong class="precios_mos">156.90 €</strong>
            </td>
          </cont>
        </tr>
      </table>
    </div>
    <h2>
      <a href="#">
        Executive (Flexible Rate) from 169 €
      </a>
    </h2>
    <div class="estancias_precios estancias_precios_new">
      <table style="width: 285px;">
        <tr class="" title="">
          <cont>
            <td style="width: 25px;">
              <input type="radio" name="estancias" id="tarifa666385" elem="tarifa" idelem="666" idreg="385" precio="169" reg="Only%20Bed" nombre="Executive%20%28Flexible%20Rate%29" />
            </td>
            <td style="width: 155px;">
              <label class="descrip" for="tarifa666385" precio="169.00" reg="Only%20Bed" nombre="Executive%20%28Flexible%20Rate%29">
                Only Bed
              </label>
            </td>
            <td style="width: 55px;"></td>
            <td style="width: 55px;">
              <strong class="precios_mos">169.00 €</strong>
            </td>
          </cont>
        </tr>
        <tr class="" title="">
          <cont>
            <td style="width: 25px;">
              <input type="radio" name="estancias" id="tarifa666386" elem="tarifa" idelem="666" idreg="386" precio="186.9" reg="Breakfast%20Included" nombre="Executive%20%28Flexible%20Rate%29" />
            </td>
            <td style="width: 155px;">
              <label class="descrip" for="tarifa666386" precio="186.90" reg="Breakfast%20Included" nombre="Executive%20%28Flexible%20Rate%29">
                Breakfast Included
              </label>
            </td>
            <td style="width: 55px;"></td>
            <td style="width: 55px;">
              <strong class="precios_mos">186.90 €</strong>
            </td>
          </cont>
        </tr>
      </table>
    </div>
  </div>
</div>
bluewonder
  • 767
  • 2
  • 10
  • 18

1 Answers1

0

Here you go with a quick and dirty approach:

    class RoomInfo
    {
        public String Name { get; set; }
        public Dictionary<String, Double> Prices { get; set; }
    }

    private static void HtmlFile()
    {
        List<RoomInfo> rooms = new List<RoomInfo>();

        HtmlDocument document = new HtmlDocument();
        document.Load("file.txt");

        var h2Nodes = document.DocumentNode.SelectNodes("//h2");
        foreach (var h2Node in h2Nodes)
        {
            RoomInfo roomInfo = new RoomInfo
            {
                Name = h2Node.InnerText.Trim(),
                Prices = new Dictionary<string, double>()
            };

            var labels = h2Node.NextSibling.NextSibling.SelectNodes(".//label");
            foreach (var label in labels)
            {
                roomInfo.Prices.Add(label.InnerText.Trim(), Convert.ToDouble(label.Attributes["precio"].Value, CultureInfo.InvariantCulture));
            }
            rooms.Add(roomInfo);
        }
    }

The rest is up to you! ;-)

TheCutter
  • 153
  • 1
  • 7
  • Thank you for your great approach, it works well in my example, however, as I have to deal with the whole raw content from the HTML Page, the xpath can mistake with some other tags, could you please guide me of how we can separate just only the HTML content like above in a raw HTML file, or what we have to change in the code to adapt the whole html file ? the html file is here http://notepad.cc/share/dlEOcHwncJ Thanks @TheCutter – bluewonder Apr 04 '14 at 13:31
  • 1
    just change the line var h2Nodes = document.DocumentNode.SelectNodes("//h2"); to var h2Nodes = document.DocumentNode.SelectNodes("//div[@class='dias']/h2"); – TheCutter Apr 04 '14 at 13:40
  • Thanks @TheCutter I have changed the code as following, however, I could just get the first element Only Bed, could you suggest me the idea http://notepad.cc/share/DorsZ3nxZx – bluewonder Apr 04 '14 at 14:38