1

I'm on a corporate network trying to write a JSoup web scraper in Java, and I can't seem to connect.

To test things out, when I run the following code, it gives me a java.netConnectException: Connection refused.

    Socket socket = null;
    try {
        socket = new Socket("google.com", 80)
        System.out.println("it works!");
    } finally {            
        if (socket != null) try { socket.close(); } catch(IOException e) {}
    }

For the record, my JSoup code looks like this:

    Connection con = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page");
    Document doc = con.get();

When I run it alone, it gives me a TimeOut exception (even after alotting it a generous timeout). What should I do to get it to work within my network?

Slothario
  • 2,830
  • 3
  • 31
  • 47
  • 1
    Have you tried Googling JSoup and Proxy? – matt freake Feb 19 '13 at 15:54
  • Lets start from the beginning. Do you have direct access to the Internet on your computer or you use corporate network proxy? – vacuum Feb 19 '13 at 16:05
  • No, port 80 will not be wide open in a corporate environment. In Internet Explorer, go to Tools -> Internet Options. Left click the Connections Tab. Left click the LAN settings button. At the bottom of the LAN settings dialog, will be the proxy information you need to put in your Java code. – Gilbert Le Blanc Feb 19 '13 at 16:05

1 Answers1

2

I found a solution: I had to simply find my proxy and set it in my code.

// if you use https, set it here too
System.setProperty("http.proxyHost", "<proxyip>"); // set proxy server
System.setProperty("http.proxyPort", "<proxyport>"); // set proxy port

Document doc = Jsoup.connect("http://your.url.here").get(); // Jsoup now connects via proxy

Also, you might need to set the user agent. I left the "referrer" code in there, although I don't think it's necessary in most cases. Note that the userAgent is made up in case the web server you're accessing discriminates against machines.

doc = Jsoup.connect("https://www.facebook.com/")
  .userAgent("Mozilla/5.0 (Windows; U; WindowsNT 5.1; en-US; rv1.8.1.6) Gecko/20070725 Firefox/2.0.0.6")
  .referrer("http://www.google.com")
  .get();
Community
  • 1
  • 1
Slothario
  • 2,830
  • 3
  • 31
  • 47