0

I am writing a small script that takes a bunch of links from a page, fetches them and scours the results for some data.
E.g. like this:

let listLinks = $('.item a');

listLinks.each(function() {
    let url = this.href;

    fetch(url, {
        credentials: 'include'
    })
    .then(response => response.text())
    .then(function(html) {
        let name = $('#title h1', html);
    })
});

My problem is the fact that once we reach selector on the response the network tab in my browser's dev-tools lights up with requests for a ton of resources, as if something (jquery?) is just loading the entire page!

What the hell is going on here?

I don't want to load the entire page(resources and all), I just want to take a bunch of text from the html response!

Edit: After some more scrutiny, I discovered it only makes network requests for any images on the ajaxed page, but not scripts or stylesheets.

It does not make these requests if I try to process the html in another way - say, call .indexOf() on it. Only if I decide to traverse it via jquery.

Edit2: Poking around in dev tools, the network tab has an "initiator" column. It says this is the initiator for the requests: github code. I don't know what to make of that however...

P.S. Inb4 "just regex it".

martixy
  • 784
  • 1
  • 10
  • 26
  • 1
    `I don't want to load the entire page, I just want to take a bunch of text from the html response!` Unfortunately that's not how AJAX works. When you make a request you will receive the ***entire*** response back. In this case you then need to dissect that response to get only the relevant data you want. This is the same way jQuery's `load()` method works. If you want to change this behaviour, amend the endpoints you call to only return the relevant HTML. – Rory McCrossan Aug 31 '18 at 08:36
  • By "the entire page" I meant also all linked resources such as images and scripts. Although now, under scrutiny, I noticed something else relevant - it requests only images(not scripts or stylesheets). I will edit my question. – martixy Aug 31 '18 at 09:03

2 Answers2

1

I've discovered the cause:

My code above(relevant line):
$('#title h1', html)
is equivalent to
$(html).find('#title h1')

And $(html) essentially creates DOM elements. Actual, literal DOM objects.

When you create an <img> element(which the HTML I parse contains), the browser automatically issues a network request.
Relevant StackOverflow question: Set img src without issuing a request

With the code in the question the created DOM elements are still associated with the current document(as noted here), therefore the browser automatically makes a request for new <img>s it doesn't have yet. The correct solution is to create a separate document, e.g.

let parser = new DOMParser();
let doc = parser.parseFromString(html, "text/html");
let name = $('#title h1', doc);

No network requests go out in this case.

JSFiddle

martixy
  • 784
  • 1
  • 10
  • 26
0

The problem is that you are using fetch. Use jQuery.AJAX

$.ajax({
  url: 'URL',
  type: 'GET',
  dataType: 'HTML',
  success: function(responseHTML) {
    console.log(responseHTML);
  } 
});
Rory McCrossan
  • 331,213
  • 40
  • 305
  • 339