1

I'm trying to fetch a CSV-formatted webpage to use as a rudimentary database. The test page is at http://prog.bhstudios.org/bhmi/database/get, and browsers open it no problem. However, when I run the following code, Java throws a 403 error:

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.util.logging.Level;
import java.util.logging.Logger;

public class Main
{

    static
    {
        Logger.getGlobal().setLevel(Level.ALL);
    }

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws IOException
    {
        InputStream is = null;
        try
        {
            System.out.println("Starting...");
            URL url = new URL("http://prog.bhstudios.org/prog/bhmi/database/get/");
            URLConnection urlc = url.openConnection();
            urlc.connect();
            is = urlc.getInputStream();
            int data;
            while ((data = is.read()) != -1)
            {
                System.out.print((char)data);
            }
            System.out.println("\r\nSuccess!");
        }
        catch (IOException ex)
        {
            Logger.getGlobal().log(Level.SEVERE, ex.getMessage(), ex);
            System.out.println("\r\nFailure!");
        }
        if (is != null)
            is.close();
    }
}

Here's the console output:

Starting...
Nov 18, 2013 3:01:48 PM org.bh.mi.Main main
SEVERE: Server returned HTTP response code: 403 for URL: http://prog.bhstudios.org/prog/bhmi/database/get/
java.io.IOException: Server returned HTTP response code: 403 for URL: http://prog.bhstudios.org/prog/bhmi/database/get/
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1626)
    at org.bh.mi.Main.main(Main.java:36)
Failure!

Note that 403 means the server is on and properly accepted the request, but refuses to do anything further. Now here's the kicker: If I get, say, http://example.com, it works just fine!

How can I get my Java app to read this file from my webserver?

Ky -
  • 30,724
  • 51
  • 192
  • 308

2 Answers2

3

I tested against your server and if I submit the request - using TamperData - with User-Agent: Java/1.6.0_14 (I just picked a random java version), your webserver responds with 403 Forbidden.

My browser shows the following error message:

Error 1010
Access denied
What happened?

The owner of this website (prog.bhstudios.org) has banned your access based on your browser's signature (cf7ab9f58210755-ua21).

In other words, your server (or more likely: your proxy, as the headers both indicate use of cloadflare-nginx and ASP.net) filters based on user agent strings. This is probably done to prevent bots and screenscrapers from accessing your websites.

You either need to drop this filter (ask your proxy adminstrator), or set a different user agent for URLConnection, see Setting user agent of a java URLConnection and How to modify the header of a HttpUrlConnection

Community
  • 1
  • 1
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
2

Your server for some reason is configured to forbid access when the request header

User-Agent: Java/...

is present. I was able to reproduce the problem and also got it to work by doing

URLConnection urlc = url.openConnection();
urlc.setRequestProperty("User-Agent", "");
urlc.connect();
Jim Garrison
  • 85,615
  • 20
  • 155
  • 190