Extract one part of html in c#

Question

I want to extract one part of html, ul with class="list-2"

<! DOCTYPE html>
<html>
    <title>Title</title>
    <body>
        <div>
            <ul class="list-1">
                <li class="item">1</li>
                <li class="item">2</li>
                <li class="item">3</li>
            </ul>
            <ul class="list-2">
                <li class="item">11</li>
                <li class="item">22</li>
                <li class="item">33</li>
            </ul>
            <ul class="list-1">
                <li class="item">111</li>
                <li class="item">222</li>
                <li class="item">333</li>
            </ul>
        </div>
    </body>
</html>

Here I extract all html from the page

string url = Request.QueryString["url"];
WebClient web = new WebClient();
web.Encoding = System.Text.Encoding.GetEncoding("utf-8");
string html = web.DownloadString(url);

Here I can delete the code until my ul

html = html.Remove(0, html.IndexOf("<ul class=\"list-2\">"));

How to get the code only from this ul?

thanks in advance!

Yes, seriously, use HtmlAgilityPack. It will take 30 minutes to learn the package, but you'll have it in your toolbox for the future. — trailmax, Feb 27 '14 at 17:36
You should use one of the many (X)HTML parsers out there and select the elements of your interest through XPath. For the love of what's holy [do not use regular expressions](http://stackoverflow.com/a/1732454/91696). — Albireo, Feb 27 '14 at 17:33

Asons · Accepted Answer · 2015-11-13T08:49:50.760

Today, late 2015, there are a few more html parsers (and headless browsers) that can do this, AngleSharp, a parser, is one.

A note, when using the "WebClient", no javascript will be executed.

This sample extract the tag from a string (in this case the "string html"):

// --------- your code
string url = Request.QueryString["url"];
WebClient web = new WebClient();
web.Encoding = System.Text.Encoding.GetEncoding("utf-8");
string html = web.DownloadString(url);

// --------- parser code
var parser = new HtmlParser();
var document = parser.Parse(html);

//Get the tag with CSS selectors
var ultag = document.QuerySelector("ul.list-2");

// Get the tag's html string
var ultag_html = ultag.ToHtml();

This sample loads the web page and extract the tag:

// Setup the configuration to support document loading
var config = Configuration.Default.WithDefaultLoader();

// Load a web page
var address = "an url";

// Asynchronously get the document in a new context using the configuration
var document = await BrowsingContext.New(config).OpenAsync(address);

// This CSS selector gets the desired content
var cssSelector = "ul.list-2";

// Perform the query to get all tags with the content
var ultag = document.QuerySelector(cssSelector);

// Get the tag's html string
var ultag_html = ultag.ToHtml();

Extract one part of html in c#

1 Answers1