1

I was checking the active links in a website with selenium web driver and java. I have passed the links to the array and while verifying I am getting the response as 403 forbidden for all links in the site. It is just a public website anyone can access. The links are working properly when clicking manually. I wanted to know Why it is not showing 200 and what can be done on this situation.

This is for Selenium webdriver with Java

for(int j=0;j< activelinks.size();j++) {
        System.out.println("Active Link address and status >>> " +  activelinks.get(j).getAttribute("href"));
        HttpURLConnection connection = (HttpURLConnection)new URL(activelinks.get(j).getAttribute("href")).openConnection();
        connection.connect();
        String response = connection.getResponseMessage();
        int responsecode = connection.getResponseCode();
        connection.disconnect();
        System.out.println(activelinks.get(j).getAttribute("href")+ ">>"+ response+ " " + responsecode);}

I expect the response code as 200, but the actual output is 403

Lipson T A
  • 49
  • 3
  • 9

3 Answers3

1

I believe your need to add the relevant Cookies to the HTTPUrlConnection, or even better consider switching to OkHttp library which is under the hood of Selenium Java Client

So you basically need to fetch the cookies from the browser using driver.manage.getCookies() function and generate a proper Cookie request header for the subsequent calls.

Example code:

driver.manage().getCookies()
        .forEach(cookie -> cookieBuilder
                .append(cookie.getName())
                .append("=")
                .append(cookie.getValue())
                .append(";"));

OkHttpClient client = new OkHttpClient().newBuilder().build();

for (WebElement activelink : activelinks) {
    Request request = new Request.Builder()
            .url(activelink.getAttribute("href"))
            .addHeader("Cookie", cookieBuilder.toString())
            .build();
    Response urlResponse = client.newCall(request).execute();
    String response = urlResponse.message();
    int responsecode = urlResponse.code();
    System.out.println(activelink.getAttribute("href") + ">>" + response + " " + responsecode);
}

If you need nothing else but response code you can consider using HEAD method to avoid executing calls for the full URLs - this will allow you to save traffic and your test will be much faster.

Dmitri T
  • 159,985
  • 5
  • 83
  • 133
1

403 Forbidden

The HTTP 403 Forbidden client error status response code indicates that the server understood the request but refuses to authorize it.

This status is similar to 401, but in this case, re-authenticating will make no difference. The access is permanently forbidden and tied to the application logic, such as insufficient rights to a resource.


Reason

I don't see any such issue in your code block. However, there is a possibility that the WebDriver controlled Browser Client is getting detected and hence the subsequent requests are getting blocked and there can be numerous factors as follows:

  • User agent
  • Plugins
  • Languages
  • WebGL
  • Browser features
  • Missing image

You can find a couple of detailed discussion in:


Solution

A generic solution will be to use a proxy or rotating proxies from the Free Proxy List.

You can find a detailed discussion in Change proxy in chromedriver for scraping purposes


Outro

You can a couple relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
1

Had the same problem, user agent was the issue in my case (read more here: https://www.javacodegeeks.com/2018/05/how-to-handle-http-403-forbidden-error-in-java.html).

Also check what request methods are allowed on your website, you can do that by looking at any endpoint in "Network" tab in Chrome. It should list the allowed request methods, in my case I couldn't use "HEAD", but "GET" did the trick.

Code:

List<WebElement> links = driver.findElements(By.tagName("a"));
boolean brokenLink = false;
    for (WebElement link : links) {
        String url = link.getAttribute("href");
        HttpURLConnection conn = (HttpURLConnection) new URL(url).openConnection();
        conn.setRequestMethod("GET");
        conn.setRequestProperty("Content-Type", "application/json");
        conn.setRequestProperty("User-Agent",
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36");

        conn.connect();
        int httpCode = conn.getResponseCode();
        if (httpCode >= 400) {
            System.out.println("BROKEN LINK: " + url + " " + httpCode);
            brokenLink = true;
            Assert.assertFalse(brokenLink);
        }
        else {
            System.out.println("Working link: " + url + " " + httpCode);
        }
    }
Kamil
  • 31
  • 5