-1

I have a client input string with some words highlighted which looks like <em>TEST</em>. however, I can see there is < existing in that string as well(appearing as solo < or <some letters...) which I want to replace the < with other letter or delete them but keep <em>TEST</em>.

I want to use regular expression to match those except <em>TEST</em> and tried a lot but still no clue, please help me out.

Keannylen
  • 453
  • 1
  • 4
  • 17
  • 1
    You can't parse HTML with regexes, but in the simple case where there is no HTML tag nesting in the string, use a negative lookahead `' – Andy Ray Jan 17 '23 at 00:06
  • except `` or except `TEST`? The title and the body of your question are inconsistent. Also, what about ``? And what about `<` occurring inside emphasis, e.g. `TEST this`? – Inigo Jan 17 '23 at 02:42
  • Please see https://stackoverflow.com/a/1732454/3412322. Then see https://stackoverflow.com/a/6520267/3412322. – Daniel Beck Jan 17 '23 at 03:59
  • [You can't parse HTML with regex.](https://stackoverflow.com/a/1732454/238884) – Michael Lorton Jan 17 '23 at 03:59
  • @AndyRay your pattern will match the `<` in `` as well as in ``. See my answer. – Inigo Jan 17 '23 at 03:59
  • @MichaelLorton yes, but the higher voted answer to the same question is legit too: [While arbitrary HTML with only a regex is impossible, it's sometimes appropriate to use them for parsing a limited, known set of HTML.](https://stackoverflow.com/a/1733489/8910547) – Inigo Jan 17 '23 at 04:04

1 Answers1

3

/<(?!\/?em>)/

This assumes you want to ignore all <em> and </em>, not just <em>TEST</em>.

⚠️ Using regex instead of a proper HTML parser will break on corner cases, or even common cases you weren't anticipating. Use at your own risk. See the links in the comments above. You can keep adding to the regex to handle more cases, but it will never get to 100%

Press Run below to try it out. Output will be updated as you type in the text area.

const pattern = /<(?!\/?em>)/g

const inField = document.getElementById('in')
const outField = document.getElementById('out')

function escapeHtml(unsafe) {
  return unsafe
    .replace(/&/g, "&amp;")
    .replace(/</g, "&lt;")
    .replace(/>/g, "&gt;")
    .replace(/"/g, "&quot;")
    .replace(/'/g, "&#039;")
}

function update() {
  const screened = inField.value.replace(pattern, '❌')
  outField.innerHTML = escapeHtml(screened)
}

inField.addEventListener('input', update, false)
inField.value = `test "<" replacement:
 - should NOT be replaced in: <em>TEST</em>
 - should be replaced in: <b>
 - should be replaced in: 4 < 3
 - should be replaced in: <em and </em
 - should be replaced in: <emph>these</emph>
`
update()
<textarea id="in" style="width:100%;height:40vh"></textarea>
<pre><code><div id="out"></div></code></pre>
Inigo
  • 12,186
  • 5
  • 41
  • 70