Get value of specific HTML tag(span tag) in C#

Question

I am developing a GOOGLE TRANSLATE software for Windows Phonw 8. I want to get the "value of ALL SPAN TAGS" inside a span tag of specific class="result_box" in C#.

<html>
.
.
<span id="result_box" class="short_text" lang="pt">
        <span class="hps">
            Olá
        </span>
        <span class="">
            .
        </span>
        <span class="hps">
            oi
        </span>
    </span>
.
.
</html>

I tried this but it is not working

html = e.Result;
var r = new Regex(@"(?i)<span[^>]*?>\s*", RegexOptions.IgnoreCase);
string capture = r.Match(html).Groups[1].Value;
MessageBox.Show(capture);

Suggest me REGEX. If possible please give me full function that returns me the text.

[You can't parse HTML with Regex](http://stackoverflow.com/a/1732454/1652345). Use an HTML parser instead. — pascalhein, Aug 24 '14 at 11:33
@user2731312 [welcome to SO and don't forget to check the Tour page to have a better experience knowing how to use this website.](http://stackoverflow.com/tour) — Prix, Aug 24 '14 at 12:02
@user3218114 excuse me, so many users as username that sometimes I forget to check it ;) — Prix, Aug 24 '14 at 12:04

score -1 · Answer 1 · answered Aug 28 '14 at 19:23

-1

what about this?

        Regex r = new Regex(@"<span[^>].*?>([^<]*)<\/span>", RegexOptions.IgnoreCase);

        foreach (Match matchedSpan in r.Matches(html))
        {
            string capture = matchedSpan.Groups[1].Value;
            MessageBox.Show(capture);
        }

answered Aug 28 '14 at 19:23

Diego

17
1

As explained in the comments, using regex to process HTML is an extraordinarily bad idea. – mason Aug 28 '14 at 19:26

score -3 · Answer 2 · answered Aug 28 '14 at 19:40

-3

Ok since @mason didn't like the previous answer, here's goes another aproach:

        XmlDocument htmlXML=new XmlDocument();
        htmlXML.LoadXml(html);
        foreach (XmlNode spanElement in htmlXML.SelectNodes("//span[@class='short_text']/span") ) {
            MessageBox.Show(spanElement.InnerText);

        }

remember to add

using System.Xml;

answered Aug 28 '14 at 19:40

Diego

17
1

-1 No, an HTML document is NOT an XML document, unless it's XHTML. But even then, there's no guarantee that the markup is going to be compliant. There are dedicated libraries out there ([HTMLAgilityPack](http://www.4guysfromrolla.com/articles/011211-1.aspx)) for parsing HTML that are fault tolerant. – mason Aug 28 '14 at 20:57
All necessary help is in the comments. – mason Sep 01 '14 at 13:58

Get value of specific HTML tag(span tag) in C#

2 Answers2