0

I have a text composed of two <div> inside one <body> saved as raw_text as following:

var raw_text = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";

I need a script for print on the screen only the <div> present in raw-text that include a certain string.

if the string wanted is:

var x = "that I want";

the script should take:

<div>This is the 'div' text that I want to print.</div>

and the output should be:

This is the 'div' text that I want to print.
Mr Lister
  • 45,515
  • 15
  • 108
  • 150

3 Answers3

1

This is the proper way to do it:

  1. Use a DOM parser
  2. Iterate the text nodes
  3. Check if they contain the desired string

var html = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";
var x = "that I want";
var doc = new DOMParser().parseFromString(html, 'text/html');
var it = doc.createNodeIterator(doc.body, NodeFilter.SHOW_TEXT);
var node;
while (node = it.nextNode()) if(node.nodeValue.includes(x)) {
  console.log(node.nodeValue);
  break;
}
Oriol
  • 274,082
  • 63
  • 437
  • 513
0
var raw_text = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";
var x = "that I want";
var homework_solution = raw_text.match(new RegExp("<div>([^<>]*?"+x+"[^<>]*?)</div>"))[1];

This should do the job. The regex could possibly be made a bit more robust.

The "proper" way to do this would be to use DOMParser to search for the node you want.

Retr0id
  • 340
  • 2
  • 10
  • [You can't parse HTML with regex](http://stackoverflow.com/a/1732454/1529630). Your last sentence should be in bold. – Oriol Oct 08 '16 at 18:14
0

You can use jQuery to convert your string to proper DOM elements, and then parse them easily, as @Retr0spectrum says on their comment. You have the HTML in a plain string:

var htmlString = "<body><div>This is the 'div' text that I don't want.</div> <div>This is the 'div' text that I want to print.</div></body>";

Now you have to:

  1. parse it to DOM,
  2. filter the elements, and
  3. get the text

Like this:

// Process the string through jQuery so it parses the DOM elements
var dom = $(htmlString);

// and then we convert to array...
var array = dom.toArray();

// ... so we can filter it, using RegEx to find the
// <div>(s) we are interested in: 
var matchingDivs = array.filter(function (div, i) {
  return $(div).text().match(/that I want/g) !== null;
});

// we pop the last matched div from the filtered array (the first 
// one would also work, since normally you will find just one)
var theDiv = matchingDivs.pop(); 

// Then get the <div>'s text:
var theText = selectedDiv.textContent;

The beautiful thing is you can chain all the methods so you can write the above like this:

var theText = $(htmlString).toArray().filter(function (div, i) {
  return $(div).text().match(/that I want/g) !== null;
})[0].textContent;

Note: In the chained methods example I took the first element instead of the last one, using the bracket operator [0] instead of pop().

Hope this helps understanding how it works.