1

Users in my site they can copy entire page from any another site (by Ctrl+A) and paste into special textarea to grab some usefull data from html.

But now I see some problem. When I wrap pasted html code with jQuery:

var page = $(html);

my browser (Chrome) start downloading all pictures that are present in this html (maybe not only pictures). This is bad for me because I use secure SSL connection and downloading pictures from another site strike out browser security lock icon.

Can I turn off picture downloading or if I can't - which one library I can use to parse html without downloading unnecessary content?

leavelllusion
  • 329
  • 5
  • 11

2 Answers2

1

You could run a regular expression to remove all the img tags, before passing it to jQuery:

For example:

$( html.replace( /<img .*?>/ig, '' ) );​​​​

For more information about regular expressions modifiers and syntax, check out MDN: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions

David Riccitelli
  • 7,491
  • 5
  • 42
  • 56
0

$(html) makes jQuery actually inject the html code into the page somewhere hidden, in order to calculate widths, heights, styles, distances, attributes, etc. This operation always uses the browser's built-in DOM parser, this is why images, iframes, background images, scripts, links etc. get accessed.

You can use $.parseXML(html); but the html code needs to be valid-formatted xml (xhtml).

If your only concern is images however you can use this code:

// take the src attribute, change it to hiddensrc
// credits: http://stackoverflow.com/a/1310706/608886
var html = html.replace(/<img([^>]*)\ssrc=(['"])(?:[^\2\/]*\/)*([^\2]+)\2/gi, 
    "<img$1 hiddensrc=$2$3$2");

// parse the code
var parsed = $(html);

///////////////
//
//   do whatever you want here
//
//////////////

// put the src attribute back at your discretion
parsed.find(img[hiddensrc]).each(function(){ 
    $(this).attr('src',$(this).attr('hiddensrc')); 
}); 
Silviu-Marian
  • 10,565
  • 6
  • 50
  • 72
  • Thanks for answer, it is correct, but seems like replace function doesn't work as should. In my example, it converts 200kb html string into 1.5 kb string. Can you check code? And also, how can I modify regexp to replace src attribute from all existing tags? – leavelllusion Jul 12 '12 at 17:04
  • I'm not that good with regex. I'm afraid you're going to have to find a better one. – Silviu-Marian Jul 12 '12 at 18:03