1

I have a regex that finds an opening and closing span tags that contain at least a style attribute right after the opening span, and I want to delete all pairs of these span tags (either empty or non-empty).

My regex is: str.replace(/<span style=.*>(\s|.*)<\/span>/g,"");

I have this (see below) block of text (string) which contains 4 pairs of span tags. When trying to remove all 4 instances of these span tags (including any data in between) using the above regex, it only removes the first instance but deletes everything after the first 2 <br><br>.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Some text goes here</span> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Some other text goes here</span> xcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Could be any text here</span> xcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Could be any other text here</span> xcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>

AndyDev0918
  • 83
  • 1
  • 10
  • 4
    Manipulating a structured document format like HTML with regular expressions is [rarely a good idea](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Phil Aug 05 '19 at 23:55
  • Why is the position of the `style` attribute within the tag significant? Do you expect other `` elements that do not have the `style` attribute first that should be ignored? – Phil Aug 05 '19 at 23:59
  • Try this `str.replace(/([\S\s]*?)<\/span\s*>/g,"");` –  Aug 06 '19 at 00:17
  • [Parsing HTML with regex is a hard job](https://stackoverflow.com/a/4234491/372239) – Toto Aug 06 '19 at 13:14

1 Answers1

1

The problem with your code is that your wildcard expressions (ie .*) are too greedy.

An alternative is to use DOM methods to remove the elements.

You can inject your HTML string into a temporary element, find the <span> elements and remove them.

const str = `Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Some text goes here</span> Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Some other text goes here</span> xcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Could be any text here</span> xcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <span style="color: rgb(51, 51, 51); font-family: &quot;Segoe UI Emoji&quot;; font-size: 49px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;">Could be any other text here</span> xcepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.<br><br>`

const el = document.createElement('div')
el.innerHTML = str

el.querySelectorAll('span[style]').forEach(span => {
  span.parentNode.removeChild(span)
})

const newStr = el.innerHTML

document.querySelector('pre').textContent = newStr
/* this is just for formatting */
pre {
  white-space: pre-wrap;
}
<!-- this is just for displaying the results -->
<pre></pre>
Phil
  • 157,677
  • 23
  • 242
  • 245