1

Regarding the possible dupe post: Replace only some groups with Regex

This is not a dupe as the post replaces the group with static text, what I want is to replace the group by retaining the text in the group.

I have some texts which contain pattern like:

\super 1 \nosupersub
\super 2 \nosupersub
...
\super 592 \nosupersub

I want to replace them using regex such that they become:

<sup>1</sup>
<sup>2</sup>
...
<sup>592</sup>

So, I am using the following regex (note the group (\d+)):

RegexOptions options = RegexOptions.Multiline; //as of v1.3.1.0 default is multiline
mytext = Regex.Replace(mytext, @"\s?\\super\s?(\d+)\s?\\nosupersub\s", @"<sup>\1</sup>", options);

However, instead of getting what I want, I got all the results replaced with <sup>\1</sup>:

<sup>\1</sup>
<sup>\1</sup>
...
<sup>\1</sup>

If I try the regex replacement using a text editor like https://www.sublimetext.com and also using Python, it is OK.

How to get such group replacement of (\d+) like that (retain the number) in C#?

Community
  • 1
  • 1
Ian
  • 30,182
  • 19
  • 69
  • 107

2 Answers2

2

Many regex tools use the \1 notation to refer to a group's value in the replacement pattern (same in syntax to a backreference). For whatever reason, Microsoft chose to instead use $1 for the notation in the .NET implementation of regex. Note that backreferences still use the \1 syntax in .NET. It's only the syntax in the replacement pattern which is different. See the Substitutions section of this page for more info.

Steven Doggart
  • 43,358
  • 8
  • 68
  • 105
1

I haven't tested this code and wrote it from memory so this might not work but the general idea is there.

Why use regex at all?

List<string> output = new List<string>();
foreach (string line in myText.Split(new string[] { Environment.NewLine }, StringSplitOptions.None))
{
    string alteredLine = line.Replace("\super", "").Replace("\nosupersub", "").Trim();

    int n;
    if (Int32.TryParse(alteredLine, out n))
    {
        output.Add("<sup>" + n + "</sup>");
    }
    else
    {
         //Add the original input in case it failed?
         output.Add(line);
    }
}

or for a linq version:

myText = myText.Split(new string[] { Environment.NewLine }, StringSplitOptions.None)
               .Select(l => "<sup>" + l.Replace("\super", "").Replace("\nosupersub", "").Trim() + "</sup>");
TheLethalCoder
  • 6,668
  • 6
  • 34
  • 69
  • 2
    because regex is short. – M.kazem Akhgary May 10 '16 at 16:15
  • @M.kazemAkhgary Doesn't mean it should be the right tool, plus this could be shorter I just added in extra checks that the OP may not need. – TheLethalCoder May 10 '16 at 16:16
  • 2
    I agree that regex is not always the best tool, but often times it is because it is so powerful and configurable (i.e. you can store the patterns in a config file or DB). Whether or not it's the best tool for the job in this case isn't really up to us to say. It doesn't strike me as a bad option for something even as simple as this, since for me it's shorter, easier to read, and therefore easier to maintain. That being said, I realize that for people who aren't as familiar with it, there's a big learning curve, so that is a factor worth considering depending on your team. – Steven Doggart May 10 '16 at 16:21
  • 1
    Actually, I am creating a program that has to support regex group replacement for my friend. That is why I use regex. I agree that regex is not always the best tool. Well, thanks for the suggestion anyway. You got my upvote. – Ian May 10 '16 at 16:22
  • @StevenDoggart I love regular expressions and use them a lot, its just a lot of the time I see them being used where they possibly shouldn't be. So I like to answer regex questions with a none regex solution if one isn't already available so that others can see there are alternatives if need be. – TheLethalCoder May 10 '16 at 16:23