2

I'm trying to parse out values from a large HTML page and I'm struggling with how to extract text from between two selectors. Here's my example HTML to illustrate:

<table class="categories">
<tr class="category">
    <td class="categoryTitle">Category #1</td>
    <td class="categoryDate">12-1-2012</td>
    <td class="categoryFoos">212</td>       
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #1</div></td>
    <td class="catItemColor">Blue</td>
    <td class="catItemSprockets">17</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #2</div></td>
    <td class="catItemColor">Red</td>
    <td class="catItemSprockets">454</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #3</div></td>
    <td class="catItemColor">Purple</td>
    <td class="catItemSprockets">11</td>
</tr>
<tr class="category">
    <td class="categoryTitle">Category #2</td>
    <td class="categoryDate">12-17-2012</td>
    <td class="categoryFoos">311</td>       
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #1</div></td>
    <td class="catItemColor">Yellow</td>
    <td class="catItemSprockets">73</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #2</div></td>
    <td class="catItemColor">Red</td>
    <td class="catItemSprockets">5</td>
</tr>
<tr class="catItem">
    <td class="catItemName"><div class="itemName">Category Item #3</div></td>
    <td class="catItemColor">Purple</td>
    <td class="catItemSprockets">11</td>
</tr>
</table>

How would I go about taking a ICsqWebResponse and parsing out each Category, with the title, date and 'foos', as well as all of the Items in each Category as a collection of items? Just so it's clear what I'm trying to end up with, the object should something look like this:

Categories = {
    Category #1 { 
       Date: 12-1-2012,
       Foos: 212,
       Items: [
          Category Item #1 {
             Color: Blue,
             Sprockets: 17
          },
          Category Item #2 {
             Color: Red,
             Sprockets: 454
          },
          ... more items ...
       ]
     },
     Category #2 {
        Date: 12-17-2012,
        Sprockets: 311,
        Items: [
            Category Item #1 {
                Color: Yellow,
                Sprockets: 73
            },
            Category Item #2 {
                Color: Red,
                Sprockets: 5
            },
            Category Item #3 {
                Color: Purple,
                Sprockets: 11
            }
        ]
     }
 }
Eddie
  • 1,228
  • 3
  • 17
  • 31

2 Answers2

0

You would loop through all rows. With the CsQuery Lib.

CQ dom = "<table> ...your html... </table>"; // or CQ.CreateFromUrl("http://www.jquery.com");
CQ rows= dom["tr"].ToList();

If you have a new category start a new one and add the items.

var categoryList = new List<Category>();
var currentCategory = null;

    foreach(var r in rows) {
       // extract class name from html, with regex
       var className = ...;

       if(currentCategory != null && className == "catItem")
       {
           var item = new CategoryItem();
           item.Name = r[".itemName"].First().Text();
           item.Color = r[".catItemColor"].First().Text();
       ...

           currentCategory.Items.Add(item);
       }
       else if(className == "category")
       {
           var item = new CategoryItem();
           item.Date = r[".categoryDate"].First().Text();
           item.Foos= r[".categoryFoos"].First().Text();
       ...

           categoryList.Add(item);
       }

    }

Disclaimer: This not production-ready code ;-)

edi spring
  • 66
  • 5
0

If I understood what you trying to say...

    CQ html = "your html here";
    html[".Category"].Each((index,dom)=>{

        var category = dom.Cq(); //everything what will go bellow
        //you will need to use .Find() function NOT '[]' or SELECT because it will
        // get values from whole html not just from your  category

        string categoryTitle = category.Find(".categoryTitle").Text();
        string categoryDate = cateogry.Find(".categoryDate").Text();
        //and etc...

        //now loop throw catItems
        category[".catItems"].Each((catIndex,catDom)=>{

            var catItem = catDom.Cq();
            //the same principe goes here. 
        });
    });
nazarkin659
  • 503
  • 3
  • 11