Get the HTML value which is not in a tag in c#

Question

I have an HTML string that has the following form:

<tr valign="top"><td colspan="2"  style="padding-bottom:5px;text-align: left"><label for="base_1001013" style="margin-bottom: 3px; float: left">Nom d'utilisateur:&nbsp;</label><span style="float: right;"><input class="PersonalDetailsClass" type="text" name="base_1001013" id="base_1001013" value="" /></span></td></tr>

(sorry for the formatting..)

Anyhow I need to get the value which is not in a tag. i.e.- Nom d'utilisateur (without the "&nbsp", but that's neglectable).

The number of tags and their values may vary, also - the number of words in the requested string and even their language may also vary.

I'm not sure if that's a regex question, an XML question, or a c# string manipulation functions question (don't have specific preferences) .. But I do prefer not using a third-party dll (as I saw is sometimes used to parse HTML in c#).

How do I get the value?

Thanks.

@tvanfosson is correct - using a regex instead of a DOM parser to parse HTML or XML will bring you nothing but pain. It's not just reinventing the wheel, it's reinventing a wheel using LEGO blocks. Regex is a great tool; it's just not the right tool for this job. — TrueWill, Oct 24 '10 at 15:14

tvanfosson · Accepted Answer · 2010-10-24T14:39:06.127

2

You should use the HtmlAgilityPack and then get the text value of the row. That will eliminate all of the HTML elements in the snippet.

var doc = new HtmlDocument();
doc.LoadHtml( stringWithHtml );
var element = doc.DocumentNode.ChildNodes["tr"];
var text = element.InnerText;

Note that you may need to play around with the navigation to the desired element depending on your actual HTML.

edited Oct 24 '10 at 14:39

answered Oct 24 '10 at 14:30

tvanfosson

524,688
99
697
795

Get the HTML value which is not in a tag in c#

1 Answers1