1

QUESTION:

How to check if a url is valid and actually loads a page ?

With my current code, only the status code is checked, which means that a url like http://fsd.com/ will be considered as valid although it does not load anything.

How can I check that the url actually points to a website that can be loaded ?


CODE:

$.ajax({
                    url: link,
                    dataType: 'jsonp', 
                    statusCode: {
                        200: function() {
                            console.log( "status code 200 returned");
                            validURL = true;
                        },
                        404: function() {
                            console.log( "status code 404 returned");
                            validURL = false;
                        }
                    },
                    error:function(){
                        console.log("Error");
                    }
                });

EDIT: By valid, I mean that the page is at last partially loaded (as in at least the html & css are loaded) instead of loading forever or somehow failing without the status code being 404.

EDIT2: http://fsd.com actually returns a 404 now as it should...

EDIT3: Another example: https://dsd.com loads an empty page (status code 200) and http://dsd.com actually loads a page with content (status code 200). On my Node.js backend, the npm package "url-exists" indicates that https://dsd.com is invalid, while my frontend with the code shown in my question indicates it is a valid url. This is what the package code looks like: https://github.com/boblauer/url-exists/blob/master/index.js but I wanted to know what would be the best way according to SO users.

EDIT4:

Sadly, the request provided by Addis is apparently blocked by CORS which blocks the execution of the rest of my code while my original request did not.

$.ajax({
                    type: "HEAD",
                    url: link,
                    dataType: 'jsonp', 
                }).done(function(message,text,response){
                    const size = response.getResponseHeader('Content-Length');
                    const status = response.status;
                    console.log("SIZE: "+size);
                    console.log("STATUS: "+status);
                    if(size > 0 && status == "200") {
                        $("#submitErrorMessage").css("display","none");
                        $('#directoryForm').submit();
                    }
                    else {
                        $("#submitErrorMessage").css("display","block");
                        $("#submitLoading").css("display","none");
                    }
                });

EDIT 5:

To be more precise, both requests trigger a warning message in the browser console indicating that the response has been blocked because of CORS but my original code is actually executed in its entirety while the the other request doesn't get to the console.log().

EDIT 6:

$.ajax({
                    async: true,
                    url: link,
                    dataType: 'jsonp', 
                    success: function( data, status, jqxhr ){
                        console.log( "Response data received: ", data );
                        console.log("Response data length: ", data.length);
                        console.log("Response status code: ", status);
                        if (status == "200" && data.length > 0) {
                            $("#submitErrorMessage").css("display","none");
                            $('#directoryForm').submit();
                        }
                        else {
                            $("#submitErrorMessage").css("display","block");
                            $("#submitLoading").css("display","none"); 
                        }

                    },
                    error:function(jqXHR, textStatus, errorThrown){
                        console.log("Error: ", errorThrown);
                    }
                });

Error:

Error:  Error: jQuery34108117853955031047_1582059896271 was not called
    at Function.error (jquery.js:2)
    at e.converters.script json (jquery.js:2)
    at jquery.js:2
    at l (jquery.js:2)
    at HTMLScriptElement.i (jquery.js:2)
    at HTMLScriptElement.dispatch (jquery.js:2)
    at HTMLScriptElement.v.handle (jquery.js:2)
TheProgrammer
  • 1,409
  • 4
  • 24
  • 53
  • 1
    What do you mean that a URL is valid and actually can be loaded? If you get a 200 it can be reached and loaded. Can you be more specific about your needs? – blurfus Feb 18 '20 at 17:47
  • Since there is no real definition of a valid page you really cannot. http code 200 is the only indication that the page opened successfully. – Nawed Khan Feb 18 '20 at 17:50
  • @blurfus Edited the question. – TheProgrammer Feb 18 '20 at 18:19
  • Well, it's still not clear to me. If you get a 200 status code, the page is good to be loaded (and in many cases, its data is already in the response). – blurfus Feb 18 '20 at 18:24
  • @blurfus You might be right. Check EDIT2. I wonder why it returned 200 before althought it does not load :/ – TheProgrammer Feb 18 '20 at 18:26
  • Does this answer your question? [using javascript to detect whether the url exists before display in iframe](https://stackoverflow.com/questions/10926880/using-javascript-to-detect-whether-the-url-exists-before-display-in-iframe) – Heretic Monkey Feb 18 '20 at 18:29
  • @HereticMonkey No. This is already what I have in my code. – TheProgrammer Feb 18 '20 at 18:42
  • Right, because that's how the problem is solved. – Heretic Monkey Feb 18 '20 at 18:44
  • @HereticMonkey Addis 's answer provided what I was looking for. I was looking for an efficient way to make sure content is actually being loaded, which is what he provided (as opposed to merely checking the status code as I am currently doing and as the question you linked to shows). – TheProgrammer Feb 18 '20 at 18:45

4 Answers4

1

A successful response without content "should" return a 204: No Content but it doesn't mean that every developer implements the spec correctly. I guess it really depends on what you consider "valid" to mean for your business case.

Valid = 200 && body has some content?

If so you can the test this in the success callback.

$.ajax({
    url: link,
    dataType: 'jsonp',
    success: function (response) {  
        // todo: test the response for "valid"
        // proper length? contains expected content?
    },  
    statusCode: {
        200: function() {
            console.log( "status code 200 returned");
            validURL = true;
        },
        404: function() {
            console.log( "status code 404 returned");
            validURL = false;
        }
    },
    error:function(){
        console.log("Error");
    }
});
1

I think the word "valid" is used a bit wrongly here. Looking at the code snippet, I can see that you are using HTTP error codes to decide whether the URL is valid or not. However, based on the description, it is clear that you consider the resource (pointed by the URL) to be valid only if it is a web page. I would like to urge the fact that HTTP can be used to access resources which need not have a web page representation.

I think you need to go a bit deeper and retrieve that info (whether it is a web-page representation) from the HTTP response that you receive and just relying on the status code would be misleading for you. One clear indicator would be looking at the response header for content-type: text/html.

Sample response from accessing www.google.com:

date: Tue, 18 Feb 2020 17:51:12 GMT
expires: -1
cache-control: private, max-age=0
content-type: text/html; charset=UTF-8
strict-transport-security: max-age=31536000
content-encoding: br
server: gws
content-length: 58083
x-xss-protection: 0
Maverick
  • 146
  • 8
1

The HEAD request is used to get meta-information contained in the HTTP headers. The good thing is that the response doesn't contain the body. It's pretty speedy and there shouldn't be any heavy processing going on in the server to handle it. This makes it handy for quick status checking.

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification. www.w3.org

$.ajax({
    type: "HEAD",
    async: true,
    url: link,
    dataType: 'json', 
}).done(function(message,text,response){
    const size = response.getResponseHeader('Content-Length');

    //optionally you may check for the status code to know if the request has been successfully completed
    const status = response.status;
});

Content-Length is one of the meta-data available in the head request which gives the size of the body in bytes, so by checking the size only without loading the whole page you could check if some content is available in the response body. -

EDIT: The above code is for dataType of json. For dataType of jsonp, callback functions for success and error properties will take of the response like the following:

$.ajax({
    url: link,
    dataType: 'jsonp', 
    crossDomain: true,
    data: data,
    success: function( data, status, jqxhr ){
        console.log( "Response data received: ", data );
        console.log("Response data length: ", data.length);
        console.log("Response status code: ", status);
    },
    error:function(jqXHR, textStatus, errorThrown){
        console.log("Error: ", errorThrown);
    }
}
Addis
  • 2,480
  • 2
  • 13
  • 21
0

What you are trying to accomplish is not very specific, I'm not going to give you a code example on how to do this but here are some pointers.

There are different ways you could get a response: the status code is not tied to the response you get, you could have a 200 response and have no data, or have a 500 error with some data, this could be an html page showing the error or a json object, or even a string specifying what went wrong.

when you say "actually loads a page", I guess you are referring to an html response, you can check for the Content-Type header on your response headers and look for text/html and also check for Content-Length header to check if there is content in your response, and even if you check for those things it's hard to tell if the html actually displays any content.

It really depends on what are you looking specifically, my suggestion is check the Content-Type header and Content-Length and it also depends on the implementation of the website as every one might have different ways of implementing the HTTP protocol.

Roberto Murguia
  • 357
  • 1
  • 2
  • 13