0

I am trying to extract text from an html string but, it is not working as expected.

The html string I have is,

<div data-content-type="html" data-appearance="default" data-element="main">&lt;p&gt;The Angelina Tank Dress is simple yet sophisticated. This dress can be thrown over a swimsuit for last minute lunch plans or belted for dinner on the patio. The high-low hemline gives it the perfect amount of swing. &lt;/p&gt;&lt;p&gt;Features:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Scoopneck&lt;/li&gt;&lt;li&gt;Sleeveless&lt;/li&gt;&lt;li&gt;Hits below the knee&lt;/li&gt;&lt;li&gt;Longer back hemline&lt;/li&gt;&lt;li&gt;Machine wash, tumble dry low&lt;/li&gt;&lt;/ul&gt;</div>

There is a description text and text inside the ul li elements. How could I extract all of that text separately. For example, extract the description text separately and the text inside li elements separately.

I tried

 const productDescription = productDetails.description.replace(/<div>|<\/div>|<ul>|<li>/g, "").trim().split("Features:");
    

I would like the text to be

The Angelina Tank Dress is simple yet sophisticated. This dress can be thrown over a swimsuit for last minute lunch plans or belted for dinner on the patio. The high-low hemline gives it the perfect amount of swing.

Scoopneck Sleeveless Hits below the knee Longer back hemline Machine wash, tumble dry low

LosMos
  • 119
  • 9

2 Answers2

1
<script>

    function stripHtml(html) {
        var textarea = document.createElement("textarea");
        textarea.innerHTML = html;
        var temporalDivElement = document.createElement("p");
        temporalDivElement.innerHTML = textarea.value;
        return temporalDivElement.textContent || temporalDivElement.innerText || "";
    }




    var htmlString = `<div data-content-type="html" data-appearance="default " data-element="main">&lt;p&gt;The Angelina Tank Dress is simple yet sophisticated. This dress can be thrown over a swimsuit for last minute lunch plans or belted for dinner on the patio. The high-low hemline gives it the perfect amount of swing. &lt;/p&gt;&lt;p&gt;Features:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Scoopneck&lt;/li&gt;&lt;li&gt;Sleeveless&lt;/li&gt;&lt;li&gt;Hits below the knee&lt;/li&gt;&lt;li&gt;Longer back hemline&lt;/li&gt;&lt;li&gt;Machine wash, tumble dry low&lt;/li&gt;&lt;/ul&gt;</div>`;

    console.log(stripHtml(htmlString));
</script>
Faiz Sandhi
  • 306
  • 2
  • 12
0
  1. Get the text content of the element.
  2. Create a new temporary div element, and add the text content as HTML to that element.
  3. Use querySelectorAll to find all the p and li elements.
  4. Iterate over that node list (after creating an array from it so we can use map).
  5. For each node check it's nodeName, and return a useful object with type and text properties.

const html = document.querySelector('div').textContent;

const div = document.createElement('div');
div.innerHTML = html;

const els = div.querySelectorAll('p, li');

const arr = Array.from(els).map(el => {
  const type = el.nodeName === 'P' ? 'para' : 'item';
  return { type, text: el.textContent }
});

console.log(arr);
<div data-content-type="html" data-appearance="default" data-element="main">&lt;p&gt;The Angelina Tank Dress is simple yet sophisticated. This dress can be thrown over a swimsuit for last minute lunch plans or belted for dinner on the patio. The high-low hemline gives it the perfect amount of swing. &lt;/p&gt;&lt;p&gt;Features:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;Scoopneck&lt;/li&gt;&lt;li&gt;Sleeveless&lt;/li&gt;&lt;li&gt;Hits below the knee&lt;/li&gt;&lt;li&gt;Longer back hemline&lt;/li&gt;&lt;li&gt;Machine wash, tumble dry low&lt;/li&gt;&lt;/ul&gt;</div>
Andy
  • 61,948
  • 13
  • 68
  • 95