Find specific div with RegEx and print content

Question

I'm trying to pull some text from an external website using this script.

It works perfectly, but it gets the entire page. I want to take only the content inside a specific div with the class 'content'. The entire page is put inside the variable 'data', and then this function is created to strip some tags:

function filterData(data){
  data = data.replace(/<?\/body[^>]*>/g,'');
  data = data.replace(/[\r|\n]+/g,'');
  data = data.replace(/<--[\S\s]*?-->/g,'');
  data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
  data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
  data = data.replace(/<script.*\/>/,'');
  return data;
}

How would I go about finding the div with the class 'content' and only viewing the content inside that?

UPDATE: Sorry about using RegExes — can you help me to get the content without using RegEx? So, this is my HTML file:

<a href="http://www.eurest.dk/kantiner/228/all.asp?a=9" class="ajaxtrigger">erg</a>
<div id="target" style="width:200px;height:500px;"></div>
<div id="code" style="width:200px;height:200px;"></div>
<script src="http://code.jquery.com/jquery.min.js"></script>
<script>
$(document).ready(function(){
var container = $('#target');
$('.ajaxtrigger').click(function(){
doAjax($(this).attr('href'));
return false;
});
function doAjax(url){
if(url.match('^http')){
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
            "q=select%20*%20from%20html%20where%20url%3D%22"+
            encodeURIComponent(url)+
            "%22&format=xml'&callback=?",
    function(data){
      if(data.results[0]){
        var tree = string2dom(data.results[0]);
        container.html($("div.content", tree.doc));tree.destroy();
      } else {
        var errormsg = '<p>Error: could not load the page.</p>';
        container.html(errormsg);
      }
    }
  );
} else {
  $('#target').load(url);
}
}
function filterData(data){

return tree;
}
});
</script>

Why are you using regex to parse HTML? Especially in with JavaScript the browser this is entirely unnecessary, you can use the DOM. — Tomalak, Oct 07 '11 at 09:36
Meanwhile, over on Planet Sane: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — johnsyweb, Oct 07 '11 at 09:53
Yeah, okay, it seems like I shouldn't be using RegEx to do this. Thing is, my JavaScript skills are very limited, and the code I found used RegEx, so that's why I'm using them. But I'll try using the DOM. Thanks! — hoegenhaug, Oct 07 '11 at 10:10

score 3 · Answer 1 · answered Oct 07 '11 at 09:34

3

Try something like this:

var matches = data.match(/<div class="content">([^<]*)<\/div>/);

if (matches) 
    return matches[1]; // div content

answered Oct 07 '11 at 09:34

fivedigit

18,464
6
54
58

score 1 · Answer 2 · answered Sep 18 '14 at 06:47

1

try this:

<div\b[^>]*class="content"[^>]*>([\s\S]*?)<\/div>

answered Sep 18 '14 at 06:47

Mahdi

725
2
7
24

score 0 · Answer 3 · answered Oct 07 '11 at 09:30

0

Here try this :

<div[^>]*?class='content'[^>]*?>(.*?)</div>

Captured reference /1 will have your content. Although you shouldn't be doing this with regexes :)

answered Oct 07 '11 at 09:30

FailedDev

26,680
9
53
73

Sorry, didn't know that. I would love to learn to do it without! – hoegenhaug Oct 07 '11 at 10:13
Check out some nice functions like : getelementbyid Use it, then check the id attributes, e.g. class='content' and then just get the content of the element by another appropriate function. No need for regex at all :) DOM is your friend! – FailedDev Oct 07 '11 at 11:00

score 0 · Answer 4 · answered Oct 07 '11 at 09:42

0

this may help you:

    var divtxt = match(/<div[^>]*class="content"[^>]>.*<\/div>/);

but it may stop at the wrong .

you should use jquery or prototype to make it a dom-object and use selectors to find the right div. using jquery you would do something like:

    var divtxt = $(data).find(".content").first().html();

remember to load the jquery library first.

answered Oct 07 '11 at 09:42

zuloo

1,310
1
8
12

Okay, but should I still load the script I included in the question? – hoegenhaug Oct 07 '11 at 10:12

Find specific div with RegEx and print content

4 Answers4