0

is there any way to simplify the HTML string? Like removing all redundant tags from the string.

For instance:

Source HTML:

<div><span><span>1</span></span><span>2</span></div>

Expected output:

<div><span>12</span></div>

(or even less)

<div>12</div>

I've known some libs like quilljs can do this, but it's a huge library, kind of overkill for my case.

also, https://github.com/htacg/tidy-html5 is kind of what I want, but it does not have a js release

Littlee
  • 3,791
  • 6
  • 29
  • 61
  • Is this a requirement by front-end or you need to process in back-end.? If its on front-end would it be possible to use regex to strip away tags.? – CodeMonkey Dec 23 '21 at 04:11
  • Have to reference https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Tomer W Dec 23 '21 at 04:35
  • @CodeMonkey this is a front-end req, I am looking for a js solution – Littlee Dec 23 '21 at 06:57
  • 2
    How do you determine if it's redundant? – Dave Newton Dec 23 '21 at 07:01
  • @DaveNewton https://github.com/htacg/tidy-html5 is what I expected but it does not have js release – Littlee Dec 23 '21 at 07:05
  • 1
    The quoted [tidy-html5](https://www.html-tidy.org/documentation/) library does not simply remove elements, like you showed in your question. Instead it repairs inconsistencies in the markup to make it valid HTML. This is hard work that cannot be done in a one-liner. – Carsten Massmann Dec 23 '21 at 07:12
  • 1
    You need to define when a tag is "redundant". E.g. is a span around a span redundant if the outer span adds no text? Or is a span around a span *always* redundant? Or are *all* spans redundant? The desired result of `
    12
    ` seems to suggest that you consider all spans redundant, which someone who is writing *CSS for these elements* might totally not expect.
    – Peter B Dec 23 '21 at 12:20
  • Tidy won't randomly remove perfectly valid tags. And again, it's not clear how you define a "redundant tag"--there may be a reason they're there. Anything that can parse HTML will allow *you* the opportunity to modify it based on whatever rules you think make sense. – Dave Newton Dec 23 '21 at 16:01

2 Answers2

0

You can try using the DOMParser:

let s = `<div><span><span>1</span></span><span>2</span></div>`
let d = new DOMParser()
let doc = d.parseFromString(s, 'application/xml')
let tag = doc.children[0].tagName
let text = doc.children[0].textContent

let result = `<${tag}>${text}</${tag}>`
console.log(result)
Timur
  • 1,682
  • 1
  • 4
  • 11
  • what I want is a more general-purpose solution, please notice that the tag can be any html tag, like 0_0 – Littlee Dec 23 '21 at 07:00
-1

Please refer to the below code, It may help you to go further.

var childs = document.querySelectorAll("div#parent")
var tmpTexts = []
for (const c of childs) {
    if (tmpTexts.includes(c.innerText)) continue
    tmpTexts.push((c.innerText).trim())
    c.parentNode.removeChild(c)
}
tmpTextArr = tmpTexts[0].split('\n');
console.log(tmpTextArr);
const para = document.createElement("div");
tmpTextArr.forEach(function(text) {
    var node = document.createElement("div");
  var nodeTxt = document.createTextNode(text);
    node.appendChild(nodeTxt);
    para.appendChild(node)
});
  document.body.appendChild(para);

   

https://jsfiddle.net/Frangly/pnLgr8ym/66/

In tmpTexts, for every new line - you should add a div tag.

Create a new Element and iterate the tmpTexts array and a div tag by using innerHTML

frangly
  • 162
  • 7