1

I have a large String and I need to extract String value from it. String value is located between delimiters

category = '

and

';

This is my regex, but I need to avoid outputing delimiters.

String productCategory = Regex.Match(html, @"category = '(.*?)';").Value;

This is the exampe category = 'Video Cards';

and I need to extract Video Cards

Andrew
  • 7,619
  • 13
  • 63
  • 117

3 Answers3

3

What you can use is the lookahead and lookbehind operators, so you end up with something like:

string pattern = @"(?<=category = ').*(?=';)";
string productCategory = Regex.Match(html, pattern ).Value;

It's also worth mentioning that parsing HTML with regexes is a bad idea. You should use an HTML parser to parse HTML.

Community
  • 1
  • 1
Servy
  • 202,030
  • 26
  • 332
  • 449
1

Have you considered using the MatchObj.Groups property? If you test your current regex at a testing site like Derek Slager's, you'll notice exactly what you want is the first Group. You should simply be able to invoke the first Group and get what you need.

productCategory.Groups[0].Value
tmesser
  • 7,558
  • 2
  • 26
  • 38
0

You want to extract the group:

String productCategory = Regex.Match(html, @"category = '(.*?)';").Groups[1].Value; 
zimdanen
  • 5,508
  • 7
  • 44
  • 89
  • Unless I'm mistaken, Groups is initialized from zero and this has only one subgroup. Using Groups[1] would probably throw an out of bounds exception. – tmesser May 07 '12 at 17:02
  • What he's getting in `.Value` is the 0th-indexed group (i.e., `.Groups[0].Value`). – zimdanen May 07 '12 at 17:03
  • Incorrect. Please check [a regex tester](http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx) with the value 'Video Cards' and the expression '(.*?)' - there is only one Group available. – tmesser May 07 '12 at 17:06
  • @YYY - Your expression is incorrect. The OP has this expression: `@"category = '(.*?)';"` – zimdanen May 07 '12 at 17:10
  • It is merely interpreted literally and broke the OP's sample data, so it was an error (at least when I was playing with it, the OP has since been updated). Still, if you add it on it doesn't change the behavior of Groups at all, since the regular expression engine doesn't have a Group there to include, only a character that it must interpret literally. – tmesser May 07 '12 at 17:14
  • Quite so, it worked in VS2010 in a small console app exactly the same way it worked in Derek Slager's RegEx tester. – tmesser May 07 '12 at 17:16
  • *Did you try the code that I posted?* Not using your expression, but using the OP's expression. I pasted the code in the pastebin. Go ahead and try the code that I used. – zimdanen May 07 '12 at 17:18
  • Copy-pasted it into my project, used the immediate window to make my results more terse. Not sure what to tell you, my friend! `Regex.Match(test, @"category = '(.*?)';").Groups[1] {} base {System.Text.RegularExpressions.Capture}: {} Captures: {System.Text.RegularExpressions.CaptureCollection} Success: false` – tmesser May 07 '12 at 17:25
  • I get a different result: `Regex.Match(test, @"category = '(.*?)';").Groups[1] {test} base {System.Text.RegularExpressions.Capture}: {test} Captures: {System.Text.RegularExpressions.CaptureCollection} Success: true` .... – zimdanen May 07 '12 at 17:30
  • Well that's bloody weird! I certainly don't think I have another place to get it, I haven't done any voodoo with RegExs on this machine. I'll have to look into this more, though that's way out of the scope of this question, I think. – tmesser May 07 '12 at 17:31