Using regular expressions to do any sort of HTML manipulation is almost always a bad idea.
My recommended solution would be to do what the browser does: Parse the string into a DOM (in a similar fuzzy, forgiving way) and then turn that DOM back into HTML.
In a browser environment, this is especially easy because you can let the browser itself do it for you, by writing the bad HTML into innerHTML
of an element and then reading it back - and the browser will have fixed it for you:
const badHtml = `
<div>Wakanda Forever</div> <span class="movie">Black Panther</span>
Movies movies movies
<span class="movie">Spider man...
`
const element = document.createElement('i')
element.innerHTML = badHtml
const result = element.innerHTML
console.log(result)
In node.js, you could instead use a library like cheerio:
import cheerio from 'cheerio'
const badHtml = `
<div>Wakanda Forever</div> <span class="movie">Black Panther</span>
Movies movies movies
<span class="movie">Spider man...
`
const $ = cheerio.load(badHtml)
const result = $.html()
console.log(result)