0

I have to load the body of an HTML page without any style attribute and no link images and everything that is not 'plain text. I would like to do it in PHP and tried very solution but I have not solved. I load the html page with an ajax call to my script and then with a regular expression I take the body that then I want this cleared. Can you help me? This is the ajax call:

$.ajax({
       type: "GET"
       url: "core/proxy.php?url="+cerca,              
       success: function(data){
       var body = data.replace(/^[\S\s]*<body[^>]*?>/i, "")
       .replace(/<\/body[\S\s]*$/i, "");
        $("div#risultato").html(body);
    },
      error: function(){
      alert("failed");
    }
    });
});
Karthik Keyan
  • 424
  • 4
  • 15
  • How about showing us the PHP solution you tried? – Jonnix Sep 03 '15 at 16:41
  • Doing what you describe is, in general, a complicated problem; it's not just a simple regular expression thing. – Pointy Sep 03 '15 at 16:44
  • Always worth pointing out - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Professor Abronsius Sep 03 '15 at 17:00
  • I have forgot that the html page are all articles of these links: http://www.dlib.org/dlib/november14/11contents.html , http://rivista-statistica.unibo.it/issue/view/467 . And for all other sites that i search i must display the body content. – Stefano Zaniboni Sep 03 '15 at 18:02

3 Answers3

2

You could use jQuery to just get the text content of the body.

So, in your success function, you would take the data, convert it to a jQuery-object and insert the text in your div.

$('div#risultato').html($(data).find('body').text());
Florian Grell
  • 995
  • 7
  • 18
1

You could clear style attributes, tag by tag, after insert the body:

function clearStyles(element) {
    element.setAttribute('style', '');
    for (var i = 0; i < element.children.length; i++) {
        clearStyles(element.children[i]);
    }
}

clearStyles(document.body);

http://jsfiddle.net/n9ocxa0g/

Or directly with jQuery:

jQuery('body *').attr('style', '');
0

Jose Antonio Riaza Valverde I corrected but nothing changes:

$.ajax({
            //definisco il tipo della chiamata
            type: "GET",
            //url della risorsa da contattare
            url: "core/proxy.php?url="+cerca,
            //azione in caso di successo
            success: function(data)
            {
                var body = data.replace(/^[\S\s]*<body[^>]*?>/i, "")
                .replace(/<\/body[\S\s]*$/i, "");
                $("div#risultato").html(body);
                clearStyles(document.getElementById('risultato'));

            },
            //azione in caso di errore
            error: function()
            {
                alert("Chiamata fallita");
            }
    });
});

and the function:

function clearStyles(element) {
element.setAttribute('style', ' ');
element.setAttribute('img', ' ');
element.setAttribute('a', ' ');
for (var i = 0; i < element.children.length; i++) {
    clearStyles(element.children[i]);
}

}