1

I have an HTML string that has the following form:

<tr valign="top"><td colspan="2"  style="padding-bottom:5px;text-align: left"><label for="base_1001013" style="margin-bottom: 3px; float: left">Nom d'utilisateur:&nbsp;</label><span style="float: right;"><input class="PersonalDetailsClass" type="text" name="base_1001013" id="base_1001013" value="" /></span></td></tr>  

(sorry for the formatting..)

Anyhow I need to get the value which is not in a tag. i.e.- Nom d'utilisateur (without the "&nbsp", but that's neglectable).

The number of tags and their values may vary, also - the number of words in the requested string and even their language may also vary.

I'm not sure if that's a regex question, an XML question, or a c# string manipulation functions question (don't have specific preferences) .. But I do prefer not using a third-party dll (as I saw is sometimes used to parse HTML in c#).

How do I get the value?

Thanks.

Oren A
  • 5,870
  • 6
  • 43
  • 64
  • 1
    Why do you prefer not to use 3rd party DLL? – svick Oct 24 '10 at 14:56
  • 1
    @tvanfosson is correct - using a regex instead of a DOM parser to parse HTML or XML will bring you nothing but pain. It's not just reinventing the wheel, it's reinventing a wheel using LEGO blocks. Regex is a great tool; it's just not the right tool for this job. – TrueWill Oct 24 '10 at 15:14

1 Answers1

2

You should use the HtmlAgilityPack and then get the text value of the row. That will eliminate all of the HTML elements in the snippet.

var doc = new HtmlDocument();
doc.LoadHtml( stringWithHtml );
var element = doc.DocumentNode.ChildNodes["tr"];
var text = element.InnerText;

Note that you may need to play around with the navigation to the desired element depending on your actual HTML.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795