0

I need a regular expression to help me make a match in my string. This is the line that contains the info that I need:

<td width="140" height="18"><a href="users_folders.cfm?viewfolder=86&viewsub=20207&addSub=20202" class="folderNav"><strong>087690898</strong></a></td>

What I need to pull out of it is the address of the href "users_folders.cfm?viewfolder=86&viewsub=20207&addSub=20202" and the value stored between the two strong tags 087690898. So I just need to identify lines that look like this.

So I have figured it out to this point:

(Match any char or digit) (Match < a href=") (Match any char or digit) (Match class="folderNav">)

Which I have created this as my regular expression:

[a-z](< a href=")[a-z](class="folderNav">)

Once I have identified this string, I can parse it pull the values I need, but it the identifying the string I am having an issue with. I am new to regular expressions, and not sure exactly how to do this. I know m regular expression is flawed. I am using C#.

Also, I know you shouldn't use Regex on HTML, but for this, I dont mind a quick and dirty solution.

user489041
  • 27,916
  • 55
  • 135
  • 204
  • 2
    Just because its similar doesn't mean you *have* to close as dupe. In fact, some times, [you should stop worrying and love them.](http://blog.stackoverflow.com/2010/11/dr-strangedupe-or-how-i-learned-to-stop-worrying-and-love-duplication/) –  Feb 06 '12 at 20:43

3 Answers3

3

Although the purists will condemn me to eternal damnation for breaking the regex/HTML rule, here’s what you need:

string line = @"<td width=""140"" height=""18""><a href=""users_folders.cfm?viewfolder=86&viewsub=20207&addSub=20202"" class=""folderNav""><strong>087690898</strong></a></td>";
Match match = Regex.Match(line, @"<a href=""(?<addr>[^""]*)"" class=""folderNav""><strong>(?<val>[^<]*)</strong></a>");
string addr = match.Groups["addr"].Value;
string val = match.Groups["val"].Value;

The (?<name>expression) parts are called “named matched subexpressions”; you may read more about them by following the link to MSDN.

In the code above, we’re using named subexpressions for matching your address and your value. In each case, we allow any character to be matched, except for the expected terminator. In the case of the href address, the attribute value ends just before the "; thus, we match [^"]*. In the case of the <strong> value, the element text ends just before the < (of the closing tag); thus, we match [^<]*. The rest of the regex pattern is verbatim.

Douglas
  • 53,759
  • 13
  • 140
  • 188
2

Something like this just worked for me:

<a href="(?<HREF>[^\"\ ]*)"[^\>]*><strong>(?<TEXT>.*)</strong>

Regex regexObj = new Regex("<a href=\"(?<HREF>[^\" ]*)\"[^>]*><strong>(?<TEXT>.*)</strong>", RegexOptions.IgnoreCase);

var match = regexObj.Match(subjectString);

if (match.Success)
{
string href = match.Groups["HREF"].Value;
string text = match.Groups["TEXT"].Value;
}
-1

to parse html better to use html agility pack

Sergey K
  • 4,071
  • 2
  • 23
  • 34