0

I want to extract the string KLE3KAN918D429 from the following html code:

<td class="Labels"> CODE (Sp Number): </td><td width="40.0%"> KLE3KAN918D429</td>

Is there a method in C# where I can specify the source-text , start string , end string and get the string between start and end ?

user3307685
  • 37
  • 1
  • 2
  • 9
  • 2
    FYI : http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 Better use an HTML Parser like HTML Agility Pack – Steve Aug 16 '14 at 08:21
  • 3
    http://htmlagilitypack.codeplex.com/ – Paul Zahra Aug 16 '14 at 08:21

3 Answers3

1

You are, as per the comments, probably better off using a parsing library to iterate the DOM structure but if you can make some assumptions about the html you'll be parsing, you could do something like below:

var html = "<td class=\"Labels\"> CODE (Sp Number): </td><td width=\"40.0%\"> KLE3KAN918D429</td>";
var labelIndex = html.IndexOf("<td class=\"Labels\">");
var pctIndex = html.IndexOf("%", labelIndex);
var closeIndex = html.IndexOf("<", pctIndex);
var key = html.Substring(pctIndex + 3, closeIndex - pctIndex - 3).Trim();
System.Diagnostics.Debug.WriteLine(key);

Likely quite brittle but sometimes quick and dirty is all that is required.

rism
  • 11,932
  • 16
  • 76
  • 116
1

As others already suggested, you should use something like HtmlAgilityPack for parsing html. Don't use regular expressions or other hacks for parsing html.

You have several td nodes in your html string. Getting last one is really easy with td[last()] XPath:

string html = "<td class=\"Labels\"> CODE (Sp Number): </td><td width=\"40.0%\"> KLE3KAN918D429</td>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var td = doc.DocumentNode.SelectSingleNode("td[last()]");
var result = td.InnerText.Trim(); // "KLE3KAN918D429"
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
0

I really suggest using HTMLAgilityPack for this.

It's as easy as:

var doc = new HtmlDocument();
doc.LoadHtml(@"<td class=""Labels""> CODE (Sp Number): </td><td width=""40.0%""> KLE3KAN918D429</td>");

var tdNode = doc.DocumentNode.SelectSingleNode("//td[@class='Labels' and text()=' CODE (Sp Number): ']/following-sibling::td[1]");
Console.WriteLine(tdNode.InnerText.Trim());

Before you start, add HtmlAgilityPack from NuGet:

Install-Package HtmlAgilityPack
Marcel N.
  • 13,726
  • 5
  • 47
  • 72