1

I'm using ajax to load html document on the web (to extract data automatically), like so:

$.ajax({
url:'http://example.com/index.php',
crossDomain: true,
accepts:"html",
dataType:"html"});

I expect it to load the html document only. However all images attached to the document are loaded too. By inspecting browser's network activity via developer console, I can see that images are being request along.

Example of returned html document.

<html>
<body>
<div>Data I wanted: item price, stock availability etc.</div>
<img src="../img/very-large-image.jpg" />
</body>
</html>

Browser is making request for '../img/very-large-image.jpg' after receiving the html doc.

Is there a way to load HTML doc without the browser making request for the image? I would like to load html only. This issue affect my app performance especially when images on that page is large in size.

I have tried searching the internet for answers but haven't found any related articles yet. I would appreciate your help.

Nik
  • 709
  • 4
  • 22

3 Answers3

2
  1. Images & scripts will start loading after you place it in DOM
  2. We can remove/modify all images before we place received html into DOM

Following function should work.

$.ajax({
url:'http://example.com/index.php',
crossDomain: true,
accepts:"html",
dataType:"html",
success:function(data){
       var $html = $(data);
       //Remove src from all images 
       $('img',$html).attr('src','');
       //Now set html to container
       $('#container').html($html);
    }
});
Jagdish Idhate
  • 7,513
  • 9
  • 35
  • 51
  • I have tested your answer, but browser will start requesting for images once '$(data)' was called. – Nik Jan 02 '16 at 03:24
  • Yea, I missed it, by `$(data)` we actually putting into the dom. I guess we need to replace all image occurrences with regex in `data`. – Jagdish Idhate Jan 02 '16 at 05:16
  • I recommend using this [method](https://www.xfive.co/blog/how-to-parse-html-response-without-loading-any-images/?comment=new#comment-12593). Look at Jean-Baptiste comment to make it work as intended though. The method is based on loading the html into a new DomDocument so that resources are not loaded. It is much faster than the regex solution on relatively big pages. – Mosset Jérémie Aug 02 '18 at 12:00
0

The following regex replace all occurance of <head>, <link>, <script>, <style>, including background and style attribute from data string returned by ajax load.

REGEX:

(<(\b(img|style|script|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))

Test regex: https://regex101.com/r/nB1oP5/1

I hope there is a a better way to work around (other than using regex replace).

Nik
  • 709
  • 4
  • 22
0

Try this RegEx. it removes every src and <link href.

this.response.replace(/src=((['][^']*['])|(["][^"]*["]))/igm, "").replace(/<link[^>]*>/igm, ""); 
Mohamed hesham
  • 130
  • 1
  • 12