1

So I am parsing a string with HTML content inside of it like this ( simplified for the purposes of the example )

var htmlProd = "this is <div> my test </div> string <div> I want to extract this </div>

Ideally, I would like to be able to extract the two sub-strings within the divs into an array with the end result being.

myStrings = ["my test","I want to extract this"]

I have tried a few things but I am stumped. This is what I have so far. I am having trouble getting each substring, I have only found solutions to get one.

var myStrings = htmlProd.match(">(.*)<"); 

Any help would be appreciated greatly. I would be able to use either JQuery or javascript in the solution.

Zakaria Acharki
  • 66,747
  • 15
  • 75
  • 101
chris
  • 185
  • 1
  • 1
  • 10

4 Answers4

1

Since you're using jQuery you could consider the string as an HTML tag and do it like below.

Suggestion using jQuery

var container = $('<div>').html("this is <div> my test </div> string <div> I want to extract this </div>");

var myStrings = container.find('div').map(function() {
  return $(this).text().trim();
}).get();

console.log(myStrings);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

Suggestion using Regex

var myStrings = "this is <div> my test </div> string <div> I want to extract this </div>".match(/<div>(.*?)<\/div>/gm);

$.each(myStrings, function(i, v) {
  myStrings[i] = v.replace(new RegExp("<div>", 'g'), "").replace(new RegExp("</div>", 'g'), "");
});

console.log(myStrings);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Zakaria Acharki
  • 66,747
  • 15
  • 75
  • 101
1

You can take a different approach here. Since it's an HTML string you are looking at, you can load it up as the HTML content of a temporary element and then use the DOM to get the content.

var htmlProd = "this is <div> my test </div> string <div> I want to extract this </div>";

// Create a temporary element as a container for the html string
let temp = document.createElement("section");

// Load the string into the container
temp.innerHTML = htmlProd;

// Use the DOM to extract the strings within the <div> elements...

// First, get the div elements into a node list
let divs = temp.querySelectorAll("div");

// Now, iterate the nodes and place the contents into a new array
let results = Array.prototype.slice.call(divs).map(function(div){
  return div.textContent;
});

// Results
console.log(results);
Scott Marcus
  • 64,069
  • 6
  • 49
  • 71
  • this is just what i needed. I was assuming that i may need to treat it as HTML but did not know a proper way to do so. This made it clear – chris Sep 04 '18 at 16:53
0

Using a jQuery map() by passing the html string to empty element and traversing that element

var htmlProd = "this is <div> my test </div> string <div> I want to extract this</div>"


var txtArr = $('<div>').html(htmlProd)
                      .find('div')
                      .map(function(_,el){return el.textContent.trim()})
                      .get();
console.log(txtArr)
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
charlietfl
  • 170,828
  • 13
  • 121
  • 150
0

Another way to do it using regex,

const regex = /<div>(.*?)<\/div>/gm;
const str = `this is <div> my test </div> string <div> I want to extract this </div>`;
let m;
let myStrings = [];
while ((m = regex.exec(str)) !== null) {
  // This is necessary to avoid infinite loops with zero-width matches
  if (m.index === regex.lastIndex) {
    regex.lastIndex++;
  }

  // The result can be accessed through the `m`-variable.
  m.forEach((match, groupIndex) => {
    if (groupIndex == 1)
      myStrings.push(match.trim());
  });
}

console.log(myStrings)

Regex: https://regex101.com/r/hMIidd/1

A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103