-1

Need: To check if URL is valid or not (i.e. return status code - 200)

Research: I checked the following threads here - Link1, Link2, Link3

Issue: I tried the solutions provided in those links, but does not work seem to work for these URL's whose http status code is not 200. The program execution kind of hangs at connection.getResponseCode()

Resources: Working link - url1 (i.e. Status Code = 200 & prompt to download file is received in browser) Not working link - url2 (i.e. Status code = 500)

Browser developer tool, showing the return codes for the working URL & not working URL

Browser Developer tool, showing return codes

CODE:

Method 1 -

    try {
                URL u = new URL(url1); // url1 -> Works, url2 -> does not work
                HttpURLConnection connection = (HttpURLConnection) u.openConnection();
                connection.setRequestMethod("GET");
    
                int responseCode = connection.getResponseCode(); // Program hangs here for url2
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    System.out.println("URL is working fine.");
                } else {
                    System.out.println("URL returned a response code: " + responseCode);
                }
            } catch (Exception e) {
                System.out.println("Error occurred while checking the URL: " + e.getMessage());
            }
        }

Method 2 (using Apache Commons UrlValidator class) -

    UrlValidator urlValidator = new UrlValidator();
    System.out.println(urlValidator.isValid(url1)); // Working URL
    System.out.println(urlValidator.isValid(url2)); // Invalid URL, but still shows valid

Added: If anyone can point me as to what I can change to ensure getResponseCode() returns the status code when URL is not working in this case or any other alternative method would help.

iCoder
  • 1,406
  • 6
  • 16
  • 35
  • First, you need to define what "valid" means. Do you mean the syntax defined by [RFC 3986](https://datatracker.ietf.org/doc/html/rfc3986)? Or do you mean whether or not a URL returns a non-error response? – Code-Apprentice Jun 24 '23 at 16:34
  • @Code-Apprentice my very first line in the query states valid for my case is (return Status Code - 200) – iCoder Jun 24 '23 at 16:36
  • Note that Apache Commons is only checking the first definition. It only validates that the URL is correct according to the RFC definition. It does NOT check if the URL actually exists nor that it doesn't return any errors. – Code-Apprentice Jun 24 '23 at 16:37
  • Also, you are only checking that `GET` works correctly. What if the URL only allows `POST` and returns a 201 response code on success? Would you consider it still "valid"? – Code-Apprentice Jun 24 '23 at 16:38
  • In this case, that link is used to obtain a file. So I check only for GET. Before I execute the download file, need to verify if the link returns Status Code - 200. Hope am more clear now. – iCoder Jun 24 '23 at 16:45
  • Yes, checking for errors makes sense and is common practice. However, this is not the same as checking if the URL is valid. Your approach to check the status code in the response is the right way to go. – Code-Apprentice Jun 24 '23 at 16:58
  • With that said, you may also want to use `UrlValidator` before making the request so that you don't waste time when the URL isn't valid. But this is a separate step from checking if the response status code is 200. – Code-Apprentice Jun 24 '23 at 17:01
  • @iCoder you need to check first if the url is valid or not using Apache Common UrlValidator but it will only check the pattern in the url it will not check if the url is accessible or not. To check if the url is accessible or not you can use Method 1. Therefore I recommend you to use Method 2 first and than Method 1. – Sachin Mewar Jun 25 '23 at 17:08

2 Answers2

0

You are talking about two separate things here:

  1. Whether or not a URL is valid
  2. Whether or not a request to the URL will succeed with a 200 response code.

UrlValidator from Apache Commons will only check the first one, so when you have a comment // Invalid URL, but still shows valid, this is incorrect. The URL is actually valid. The fact that the response code is something other than 200 when you make a request does not make the URL invalid. A 500 error means that the code which processes the request has a bug that most likely throws an exception of some kind.

You probably should check if the URL is valid with UrlValidator before attempting to make a request. This will help you avoid making requests when the URL is invalid. Since this validation is calculated using your computer's CPU, it is much faster than making a network request that is doomed to fail.

But even if the URL is valid, you will still need to make the request and check the response code, just like you are doing, in order to verify that the response is relevant.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
0

For anyone who is facing a similar issue, I managed a workaround using the HTTPstatus API.

The connection.getResponseCode() hangs for the invalid case, dont know why. Maybe some experts if they have time can check on it.

Don't know why my question has been down voted, despite following the guidelines & clearly stating what the issue is, the research done & what specific help I seek.

iCoder
  • 1,406
  • 6
  • 16
  • 35