0

I have markdown content with multiple images where each image is:

![Image Description](/image-path.png)

I am selecting all images in the markdown content using Regex:

var matches = new Regex(@"!\[.*?\]\((.*?)\)").Matches(content);

I am getting 2 groups:

Groups[0] = ![Image Description](/image-path.png);  > (Everything)

Groups[1] = /image-path.png                         > (Image Path)

Wouldn't be possible to get instead?

Groups[0] = Image Description.                      > (Image Description)

Groups[1] = /image-path.png                         > (Image Path)
Miguel Moura
  • 36,732
  • 85
  • 259
  • 481
  • If you put part of your regular expression in parentheses `( ... )` then that becomes a capture group. It's not just for precedence / order of operations. Possible duplicate: https://stackoverflow.com/questions/6375873/regular-expression-groups-in-c-sharp Groups[0] will always be the whole thing. but you'll have a Groups[1] and Groups[2] of your own design that capture the right portions of the text. like `@"!\[(.*?)\]\((.*?)\)"` – Wyck Feb 11 '22 at 16:29
  • Does this answer your question? [Regular Expression Groups in C#](https://stackoverflow.com/questions/6375873/regular-expression-groups-in-c-sharp) – Wyck Feb 11 '22 at 16:32
  • For C#, you can give the groups a name, and specify `RegexOptions.ExplicitCapture` to only capture the named groups. `!\[(?.*?)\]\((?.*?)\)` – Richard Deeming Feb 11 '22 at 16:34
  • @Wyck That will give 2 capture groups, not a match and a single group with a different value than the match. – The fourth bird Feb 11 '22 at 16:40
  • @Thefourthbird Maybe I misunderstood, but I believe this is all that is desired: ([evidence](https://dotnetfiddle.net/IRs8xk)). – Wyck Feb 11 '22 at 16:48
  • @Wyck That will give you group 1 and group 2 and `Console.WriteLine(matches[0].Groups[0]);` will give you the full match which is too much looking at the question. – The fourth bird Feb 11 '22 at 16:50
  • Yes, and actually it gives you 3 groups: Group 0 (full text) Group 1 (just the image description) and Group 2 (just the path), which I believe is what is desired here. Is it the indices of 0 and 1 vs 1 and 2 that you are hung up on? It's hardly worth the effort to introduce a non-capturing group just to get the right indices, IMO. The auto-captures are auto-numbered starting at 1 - knowing that, I suspect, is likely sufficient information to unblock the asker. We can let @MiguelMoura decide. – Wyck Feb 11 '22 at 16:55
  • @Wyck Yes, but group 0 is the full match, and group 1 and group 2 are the capture group values.The question just states group 0 and group 1. – The fourth bird Feb 11 '22 at 17:01
  • 1
    Yeah, you're technically right. And you've adeptly created a (more complicated) regular expression that satisfies that requirement (by cleverly making the subexpression be the entire capture group by switching to capturing data out of the assertions) But I believe that asking for the captures to be at indices 0 and 1 constitutes an **XY problem** and I believe the intent was just to parse the markdown into its fields. I will provide an answer that uses a simpler regular expression as an alternate approach, just in case. And I think my approach is simpler. Yours is correct too though. – Wyck Feb 11 '22 at 17:35

2 Answers2

1

Currently the group 1 value is part of the matched string.

You could get the match for Image Description and only /image-path.png) in group 1 using a lookbehind and a lookahead with a capture group:

(?<=!\[)[^][]*(?=]\(([^()]*)\))

The pattern in parts matches:

  • (?<=![) Assert ![ to the left
  • [^][]*] Match any char except [ and ]
  • (?= Positive lookahead to assert to the right
    • ]\(([^()]*)\) Match ] and capture in group 1 what is between (...)
  • ) Close lookahead

.NET regex demo

string pattern = @"(?<=!\[)[^][]*(?=]\(([^()]*)\))";
string input = @"![Image Description](/image-path.png)";

Match m = Regex.Match(input, pattern);
Console.WriteLine(m.Groups[0]);
Console.WriteLine(m.Groups[1]);

Output

Image Description
/image-path.png

enter image description here

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

You can capture the relevant sections of your content text by using a capture group.

Compare your regex and mine, where I made a very small change by adding parentheses to capture the Image Description part of your content:

!\[.*?\]\((.*?)\)
!\[(.*?)\]\((.*?)\)

Capture groups are automatically numbered starting at index 1 so these groups are available as

  • matches[0].Groups[1]: which contains Image Description and
  • matches[0].Groups[2]: which contains /image-path.png

matches[0].Group[0] is still the whole match.

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
            string content = @"![Image Description](/image-path.png)";
            var matches = new Regex(@"!\[(.*?)\]\((.*?)\)").Matches(content);
            Console.WriteLine(matches[0].Groups[1]);
            Console.WriteLine(matches[0].Groups[2]);
    }
}

This outputs:

Image Description
/image-path.png

Here is a Runnable .NET Fiddle of the above.

Wyck
  • 10,311
  • 6
  • 39
  • 60