Replace token to different token only when the token is inside some specific tag

Question

const lessonText = "<div><blockquote>&quot;&quot;</blockquote><p>Heni &quot;</p><blockquote>no quotation</blockquote><span>hi</span><blockquote><span style=\"font-family:Courier New,Courier,monospace;\">row.names(vol) &lt;- c(&quot;</span></blockquote></div>"

Regex to capture and replace " with " when &quot is inside blockquote tag, that means i don't want &quot here, <p>Heni "</p>, to be changed to ".

and why do you need regex instead of fixing the source HTML directly? — Samuel Liew, Aug 09 '22 at 06:21
the source is coming from CKEditor editor(WYSIWYG), from backend. So no way to modify HTML directly. — Henok Tesfaye, Aug 09 '22 at 06:27
See [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/a/1732454/3832970). — Wiktor Stribiżew, Aug 09 '22 at 13:57
It's a string, I've added a screenshoot in the question. And I would like to replace some tokens before displaying it in html using regex. — Henok Tesfaye, Aug 09 '22 at 13:58

Peter Thoeny · Accepted Answer · 2022-08-09T17:55:31.093

You'd need a proper HTML parser to support nested tags, for example, blockquote tags could be nested.

Here is a solution if you can live with the limitation of not supporting corner cases. It uses nested replaces, the outer one to identify blockquote tags with its content, the inner one to take action on the blockquote content:

const lessonText = "<div><blockquote>&quot;&quot;</blockquote><p>Heni &quot;</p><blockquote>no quotation</blockquote><span>hi</span><blockquote><span style=\"font-family:Courier New,Courier,monospace;\">row.names(vol) &lt;- c(&quot;</span></blockquote></div>"
let result = lessonText.replace(/(<blockquote>)(.*?)(<\/blockquote>)/g, function(m, g1, g2, g3) {
  return g1 + g2.replace(/&quot;/g, '"') + g3;
});
console.log('Result: ' + result);

Result: <div><blockquote>""</blockquote><p>Heni "</p><blockquote>no quotation</blockquote><span>hi</span><blockquote><span style="font-family:Courier New,Courier,monospace;">row.names(vol) <- c("</span></blockquote></div>

Explanation of outer replace regex:

(<blockquote>) -- capture group 1: opening blockquote tag
(.*?) -- capture group 2: non-greedy scan over content until:
(<\/blockquote>) -- capture group 3: closing blockquote tag
/g flag -- replace all patterns

Note:

Limitation: This fails with nested tags
If you expect attributes for the blockquote you could use regex (<blockquote( [^>]*)?>) instead (which does not support corner cases like <blockquote tile="<gotcha>!">)
If you expect newlines in the blockquote content you can use regex ([\s\S]*?) instead.

Explanation of inner replace regex:

" -- capture literal " text (you could use string "&quote" instead)
/g flag -- replace all patterns

score -2 · Answer 2 · answered Aug 09 '22 at 07:03

-2

You can try :

lines = lines.replace(<blockquote.*?<\/blockquote/g, (fullMatch, g1) => {
    return fullMatch.replace(/&quot;/, '"');
});

answered Aug 09 '22 at 07:03

user1075296

581
3
10

Replace token to different token only when the token is inside some specific tag

2 Answers2