1
var regex = new Regex(@"^(?: )?\((\w+)\)$");
var value = " (HTML)";

//I tried to play around with the following but it captures the whole string
var match = ResourceTypeRegex.Match(resourceType);

//The following lines all evaluate to the entire string
match.Groups.OfType<Group>().SingleOrDefault();
match.Captures.OfType<Capture>().SingleOrDefault();
match.Groups[0].Captures.OfType<Capture>().SingleOrDefault();

I only want to capture HTML or whatever string it is.

Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632
  • 1
    Have you considered using an XML/HTML parser instead? http://stackoverflow.com/a/1732454/1808494 – Aron Apr 07 '15 at 12:32
  • @Aron, I am, and I'm using [HtmlAgilityPack](https://htmlagilitypack.codeplex.com/). Anyway this is an `HtmlTextNode` that the string I'm referring to is returned by its `InnerHtml` property... – Shimmy Weitzhandler Apr 07 '15 at 13:07

3 Answers3

1

Your regex is a bit wrong perhaps? The following will return HTML. Your regex is missing the second capture.

var ResourceTypeRegex = new Regex(@"^(?: )?\((\w+)\)$");
var value = "&nbsp;(HTML)";

var match = ResourceTypeRegex.Match(value);

Console.WriteLine("'" + match.Groups[1] + "'");

To get at the capture, start with index 1 using the Groups array.

I am not sure why you want to use LINQ on this but since you insist, you can create this extension method:

public static IEnumerable<string> CapturingGroups(this GroupCollection c) {     
    var query = c.OfType<Group>().Select(g => g.Value);

    //We only want index 1 and over since 0 is actually the entire string
    //if (c.Count > 1)
        query = query.Skip(1);

    return query;
}

And instead of using match.Groups[1], you can change it to Console.WriteLine("'{0}'",match.Groups.CapturingGroups().FirstOrDefault());

Running example: https://dotnetfiddle.net/097fo9

Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632
Jimmy Chandra
  • 6,472
  • 4
  • 26
  • 38
  • I only want to capture `HTML` or whatever string it is. – Shimmy Weitzhandler Apr 07 '15 at 12:32
  • Simple enough, change the regex to: var ResourceTypeRegex = new Regex(@"^(?: )?\((\w+)\)$"); Now it will ignore the enclosing (). Updated the dotnetfiddle for working example. – Jimmy Chandra Apr 07 '15 at 12:43
  • So isn't there a way to refer only to capturing groups i.e. `match.CapturingGroups.SingleOrDefault()`? Please always wrap code snippets in `\`` quotes, also in comments. – Shimmy Weitzhandler Apr 07 '15 at 13:01
  • As stribizhev mentioned... AFAIK, you can either use index based or if named capture as per MariusZ answer, you can use the named given to the capture group. I don't think you can use LINQ to parse this unless you go out of your way to dump the captures into a proper array, perhaps using extension method and then use Linq on that. see: https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.groupcollection(v=vs.110).aspx – Jimmy Chandra Apr 07 '15 at 13:24
  • you're last code is an error because `mach.Groups[1]` is not `match.Groups.FirstOrDefault()`, and the `ToList()` can be replaced with `OfType()`, anyway I've made some changes to your post. – Shimmy Weitzhandler Apr 07 '15 at 16:46
  • I ended up using `match.Groups.OfType().Last();`. Thanks – Shimmy Weitzhandler Apr 07 '15 at 18:33
  • I missed two things: 1) missed those parenthesis in my regex 2) I didn't know that the first capturing group will always be the entire string. And hey, LINQ is part of the language ages ago. Why not keep things less verbose? And it will also perform faster than `ToList`, especially if I'm to perform other upcoming refinements. The advantage of IEnumerable is that it's [executed deferred](https://msdn.microsoft.com/en-us/library/bb669162.aspx). – Shimmy Weitzhandler Apr 07 '15 at 19:00
0

All the examples below will return HTML right after &nbsp;( and before the ) at the end of the string.

(?<=&nbsp;\)) is a look-behind that ensures we have &nbsp;( before HTML (but does not add it to the captured result). (?=\)$) is a positive look-ahead checking if we have ) at the end of the string ($). Again, the ) is not consumed, and is not part of the match.

Regex ResourceTypeRegex = new Regex(@"^(?:&nbsp;\()?(\w+)(?=\)$)");
var value = "&nbsp;(HTML)";
var result56 = ResourceTypeRegex.Match(value).Groups[1].Value;

The output is HTML with no round brackets. The (?:&nbsp;\()? makes the &nbsp;) optional.

If you use .SingleOrDefault(), it will only return the 0th capture group, that is equal to entire match.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I only want to capture `HTML` or whatever string it is. – Shimmy Weitzhandler Apr 07 '15 at 12:32
  • I edited the answer, now, the round brackets will not be part of the return string. – Wiktor Stribiżew Apr 07 '15 at 12:35
  • Notice that the first group is optional. Anyway isn't there a way to refer only to capturing groups i.e. `match.CapturingGroups.SingleOrDefault()`? – Shimmy Weitzhandler Apr 07 '15 at 13:02
  • Yes, you can. Only use them like this: `regex.Match(input).Groups[1].Value`. – Wiktor Stribiżew Apr 07 '15 at 13:06
  • No, I don't want to refer to the groups by their index. I only want to refer to **capturing groups**, if I won't have a choice I'll just use a named group, that's what I usually do, but once and for all I'm asking this question to learn about this precise issue. – Shimmy Weitzhandler Apr 07 '15 at 13:08
  • You should know then that there is always the 0th capturing group which equals to the whole match. To obtain the necessary capture group, you should either address it by index (starting by 1) or by name (if you use a named capturing group). See the last sentence in my updated answer. – Wiktor Stribiżew Apr 07 '15 at 13:10
0
var match = Regex.Match(inputString, @"^&nbsp;\((?<yourMatch>.*?)\)$");
var value = match.Groups["yourMatch"].Value;
Mariusz
  • 3,054
  • 2
  • 20
  • 31