1

I am developing a GOOGLE TRANSLATE software for Windows Phonw 8. I want to get the "value of ALL SPAN TAGS" inside a span tag of specific class="result_box" in C#.

<html>
.
.
<span id="result_box" class="short_text" lang="pt">
        <span class="hps">
            Olá
        </span>
        <span class="">
            .
        </span>
        <span class="hps">
            oi
        </span>
    </span>
.
.
</html>

I tried this but it is not working

html = e.Result;
var r = new Regex(@"(?i)<span[^>]*?>\s*", RegexOptions.IgnoreCase);
string capture = r.Match(html).Groups[1].Value;
MessageBox.Show(capture);

Suggest me REGEX. If possible please give me full function that returns me the text.

CodeRunner
  • 176
  • 3
  • 7

2 Answers2

-1

what about this?

        Regex r = new Regex(@"<span[^>].*?>([^<]*)<\/span>", RegexOptions.IgnoreCase);

        foreach (Match matchedSpan in r.Matches(html))
        {
            string capture = matchedSpan.Groups[1].Value;
            MessageBox.Show(capture);
        }
Diego
  • 17
  • 1
  • As explained in the comments, using regex to process HTML is an extraordinarily bad idea. – mason Aug 28 '14 at 19:26
-3

Ok since @mason didn't like the previous answer, here's goes another aproach:

        XmlDocument htmlXML=new XmlDocument();
        htmlXML.LoadXml(html);
        foreach (XmlNode spanElement in htmlXML.SelectNodes("//span[@class='short_text']/span") ) {
            MessageBox.Show(spanElement.InnerText);

        }

remember to add

using System.Xml;
Diego
  • 17
  • 1
  • -1 No, an HTML document is NOT an XML document, unless it's XHTML. But even then, there's no guarantee that the markup is going to be compliant. There are dedicated libraries out there ([HTMLAgilityPack](http://www.4guysfromrolla.com/articles/011211-1.aspx)) for parsing HTML that are fault tolerant. – mason Aug 28 '14 at 20:57
  • All necessary help is in the comments. – mason Sep 01 '14 at 13:58