How do I detect with JavaScript or JQuery whether a URL points to a web page or a binary file?

Question

I'm crawling a web page list of links that are either web pages or large binary files (PPT etc), using javascript and jquery.

How do I detect whether the content is a web page ('text/html') or not? I'm pretty sure it is looking at the HTTP header using $.ajax, and I know there are some similar posted questions, but I can't find an example that fits this particular question.

Possible duplicate of [jquery how to check response type for ajax call](https://stackoverflow.com/questions/3741574/jquery-how-to-check-response-type-for-ajax-call) — Zackary Murphy, Nov 15 '17 at 15:50
You cannot, until you actually visit the URL and observe the `Content-Type` header value. — 31piy, Nov 15 '17 at 15:52

score 3 · Accepted Answer · answered Nov 15 '17 at 15:52

3

You can check extension of url - lightest method. Or you can try ajax solution

var url = 'someurl';
var xhttp = new XMLHttpRequest();
xhttp.open('HEAD', url);
xhttp.onreadystatechange = function () {
  if (this.readyState == this.DONE) {
    console.log(this.status);
    console.log(this.getResponseHeader("Content-Type"));
  }
};
xhttp.send();

answered Nov 15 '17 at 15:52

Mateusz Kudej

447
1
8
23

Thanks @Mateusz-Kudej, that was exactly the code I was looking for! – El-Jus Nov 16 '17 at 19:20
1

Im glad to help you ;) @El-Jus – Mateusz Kudej Nov 20 '17 at 11:00

score 2 · Answer 2 · answered Nov 15 '17 at 15:52

You won't reliably be able to infer the type from the URL, as it may contain an extension like exe or html, but doesn't have to, and if it does, it's not a guarantee.

The closest you can get without completely downloading and examining the file is probably to fire off a HEAD HTTP request to the URL. This should return the response headers without the body, which in turn should contain the Content-Type header. This all depends on the implementation and configuration of the backend though, so no guarantee that the request will be answered correctly or even answered at all.

Thanks @Timo for the guidance on assuming a file extension is genuine, much appreciated. — El-Jus, Nov 16 '17 at 19:21

score 1 · Answer 3 · answered Nov 15 '17 at 15:50

1

If you have the file names, you can use filename.split('.').pop() This returns the extension of the file.

answered Nov 15 '17 at 15:50

Ryan Knutson

108
2
14

2

Yeah, what could go wrong? Joking aside, what if the link url does not actually contain the file name? For example some CMSs don't expose the actual file name in the URL – Federico klez Culloca Nov 15 '17 at 15:52
2

Not always, that's my point. It's not bad code per se, it's just an incomplete solution for an incomplete requirement. – Federico klez Culloca Nov 15 '17 at 15:57

How do I detect with JavaScript or JQuery whether a URL points to a web page or a binary file?

3 Answers3