0

EDIT: This example uses html, but I need this type of scenario for working with other types of strings. Please read this as a regex issue, not a html issue.

Let's say I have a string like this:

<h1>Hello</h1><h2>World</h2><h3>!</h3>

I may need to replace text to any one of those heading tags, but let's use this example, where I just want to modify <h2> to look like this:

<h1>Hello</h1><div id="h2div"></div><h2>World</h2><h3>!</h3>

Since I may need to replace any of the headings, I only search for <h* using regex. Now, I want my code to say "of all the <h* tags you found, only replace the second one".

I thought I found the answer here: How do I replace a specific occurrence of a string in a string?

Unfortunately, the results are not what I am looking for. Here is my sample code:

    private void button1_Click(object sender, EventArgs e)
    {
        //sample html file string:
        var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";

        //this text should replace <h2 with <div id="h2div"></div><h2"
        var replacementString = "<div id=\"" + "h2div" + "\"" + "</div>" + "<h2";
        int replacementIndex = 1; //only replace the second occurence found by regex.

        //find ALL occurrences of <h1 through <h6 in the file, but only replace <h2.
        htmlText = Regex.Replace(htmlText, "<h([1-6])", m => replacementString + replacementIndex++);

    }

It does not matter whether I specify replacementIndex or replacementIndex++, which makes sense but I just wanted to match the code as closely as possible to the answer I found.

The output looks like this:

<div id="h2div"></div><h21>Hello</h1><div id="h2div"></div><h22>World</h2><div id="h2div"></div><h23>!</h3>

There are lots of things that should not be happening here. First, only one <div> tag should have been created, rather than three. Second, the <h tag is only replaced instead of <h2, so now we end up with <h21, <h22, and <h23.

From a few months ago, I'm getting better at understanding regex matching but I am really unfamiliar with regex matchevaluators and groups; which I guess is what I probably need here.

Could you recommend how I can fix the code so I can replace a specific index of a regex match?

Community
  • 1
  • 1
Bill
  • 582
  • 1
  • 7
  • 21
  • 1
    I would recommend using HtmlAgilityPack instead of Regex for manipulating HTML. – Tim Apr 08 '16 at 19:41
  • This is just an example. I have some non-html scenarios where this is required as well. – Bill Apr 08 '16 at 19:42

3 Answers3

0

Sorry can not answer in C# but the answer should be very similar. For your particular case your regexp attribute for JavaScript String.prototype.replace() is this /(<h1.+?\/h1>)/ and the replacing attribute is "$1<div id="h2div">" So;

var str = "<h1>Hello</h1><h2>World</h2><h3>!</h3>",
 repStr = str.replace(/(<h1.+?\/h1>)/,'$1<div id="h2div"></div>');

console.log(repStr) // "<h1>Hello</h1><div id="h2div"></div><h2>World</h2><h3>!</h3>"

Or if you don't want to use a capture group you can still do like

var repStr = str.replace(/<h1.+?\/h1>/,'$&<div id="h2div"></div>');

which will essentially give the same result in this particular case.

Redu
  • 25,060
  • 6
  • 56
  • 76
  • Thank you for the response. Looking at your code, I think you assume I am looking for only h2, but I am looking for the second occurrence in a regex match. In other cases, I might be looking for the third occurrence. Also, the string is an example, so there is no guarantee h2 will come immediately after h1. – Bill Apr 08 '16 at 20:12
  • ... where I just want to modify to look like this: `

    Hello

    World

    !

    ` From what i understand, you don't want to replace the h2 tag but you are inserting a div with an id of `h2div` inbetween h1 and h2 tags. So that's what i did. Sorry if am not getting exactly what you are after.
    – Redu Apr 08 '16 at 20:21
  • Hello. Think of it this way.. you could find any tag with that looks like this ` – Bill Apr 08 '16 at 20:42
0

using the MatchEvaluator?

private static int count = 0;
    static string CapText(Match m)
    {
        count++;

        if (count == 2)
        {
            return "<div id=\"h2div\"></div>" + m.Value;
        }

        return m.Value;
    }

private void button1_Click()
{
    var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";
    Regex rx = new Regex(@"<h([1-6])");
    var result = rx.Replace(htmlText, new MatchEvaluator(ClassOfThis.CapText));
}
Jason
  • 19
  • 2
  • I am trying to test this, but am unclear as to what you mean by `// do something here`? – Bill Apr 08 '16 at 20:56
0

I struggled with this for a full day. Naturally, asking the question sometimes gets the creative juices flowing, so this is the solution I came up with. It uses MatchCollection and then uses a string builder to insert the string. The string builder might be overkill for this, but it works :-)

The replacementIndex defines which of the matches you want to insert the text. In my case, the regex finds three instances and modifies the found Index 1. From there, I get the starting string index and use the substring to insert the text. This is just test code from a button to prove the functionality.

    private void button1_Click(object sender, EventArgs e)
    {
        //sample text.
        var htmlText = "<h1>Hello</h1><h2>World</h2><h3>!</h3>";

        //the string builder will handle replacing the text.
        var stringBuilder = new StringBuilder(htmlText);

        //build the replacement text.
        var replacementString = "<div id=\"" + "h2div" + "\">" + "</div>";
        int replacementIndex = 1; //only replace the second occurence found by regex (zero-indexed).

        //find ALL occurrences of <h1 through <h6 in the file, but only replace <h2.
        var pattern = "<h([1-6])";
        MatchCollection matches = Regex.Matches(htmlText, pattern); //get all the matches.
        int startIndex = matches[replacementIndex].Index; //get the starting string index for the match.

        //insert the required text just before the found match.
        stringBuilder.Insert(startIndex, replacementString);

        //copy text to clipboard and display it on screen.
        htmlText = stringBuilder.ToString();
        System.Windows.Forms.Clipboard.SetText(htmlText);
        MessageBox.Show(htmlText);
    }
Bill
  • 582
  • 1
  • 7
  • 21