1

My input is:

'<div><strong>test</strong><div><strong>kkk</strong></div><div><strong>111</strong></div></div>'

And the output needs to be:

'<div><strong>test</strong><div><strong>kkk</strong></div><div>111</div></div>'

(<strong> tag unwrapped from 111)

Here is what I have so far:

var input = '<div><strong>test</strong><div><strong>kkk</strong></div><div><strong>111</strong></div></div>';

var effect = 'strong',  // dynamic
    selectedText = '111';  // dynamic

var regex = new RegExp('<' + effect + '>.*' + selectedText + '.*<\/' + effect + '>');
                       
if(regex.test(input)){
    console.log('replacing');
    input = input.replace(regex, selectedText);
}

console.log(input);

For some reason, it is giving me the output as: <div>111</div></div>

How do I fix this?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Rahul Desai
  • 15,242
  • 19
  • 83
  • 138
  • 2
    Please don't process xml/html with regexes... – Willem Van Onsem Jun 02 '15 at 11:06
  • 1
    @CommuSoft What is the better option and why is it better? – Rahul Desai Jun 02 '15 at 11:07
  • 3
    You can for instance [create a dummy DOM](http://stackoverflow.com/a/10585079/67579) and then rewrite it that way; or use xpath. It is better because [a regex is not capable of processing recursive structures](http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-reg). – Willem Van Onsem Jun 02 '15 at 11:09

2 Answers2

3

Your current regex <strong>.*111.*<\/strong> matches too much (= is greedy).

You need to only consume non-tags with [^<>]*:

var input = '<div><strong>test</strong><div><strong>kkk</strong></div><div><strong>111</strong></div></div>';

var effect = 'strong',  // dynamic
    selectedText = '111';  // dynamic

var regex = new RegExp('<' + effect + '>([^<>]*)' + selectedText + '([^<>]*)</' + effect + '>');
// Or this one if you do not want to keep any text around the selectedText
//var regex = new RegExp('<' + effect + '>([^<>]*)' + selectedText + '([^<>]*)</' + effect + '>');
                       
if(regex.test(input)){
    console.log('replacing');
    input = input.replace(regex, "$1" + selectedText + "$2");
    // Or this one if you opt to replace the whole text with your selectedText
    // input = input.replace(regex, selectedText);
}

console.log(input);
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • `([^<>]*)' + selectedText + '([^<>]*)` and `"$1" + selectedText + "$2")` is in case you want to keep the text around your `selectedText`. If you do not want to keep it, you can use original suggestion. What is working better? I added both options to the answer now. – Wiktor Stribiżew Jun 02 '15 at 11:18
  • Both are working good, cant say which one is better. I will do a few tests and let you know. – Rahul Desai Jun 02 '15 at 11:37
  • 1
    Don't do a few tests, do a few milliion. I guarantee that whenever you process XML with regular expressions, someone can write a test case where your code will give the wrong answer. – Michael Kay Jun 02 '15 at 13:50
1

.* in your regex is matched greedily, i.e., it will try to cover as much of the input as possible while still getting the whole expression to match. Removing the wildcards fixes the issue:

var input = '<div><strong>test</strong><div><strong>kkk</strong></div><div><strong>111</strong></div></div>';

var effect = 'strong',  // dynamic
    selectedText = '111';  // dynamic

var regex = new RegExp('<' + effect + '>' + selectedText + '<\/' + effect + '>');
                       
if(regex.test(input)){
    console.log('replacing');
    input = input.replace(regex, selectedText);
}

console.log(input);

Output:

"replacing" js:21:4
"<div><strong>test</strong><div><strong>kkk</strong></div><div>111</div></div>"

But as CommuSoft mentioned in the comments, XML- or HTML-specific tools like are a much better choice for querying and manipulating nested XML or HTML structures. Your manipulation can be looked at as a transformation from one nested structure to another, differently nested one, so might be something to look at.

Community
  • 1
  • 1
das-g
  • 9,718
  • 4
  • 38
  • 80