6

i have a problem downloading a file from a url like www.example.com/example.pdf via a proxy and saving it on the filesystem in java. Does anybody have an Idea on how this could work? if I get the InputStream i can simply save it to filesystem with this:

final ReadableByteChannel rbc = Channels.newChannel(httpUrlConnetion.getInputStream());    
final FileOutputStream fos = new FileOutputStream(file);
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();

but how to get the inputstream of the a url via a prox? if i am doing it like this:

SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);

i am getting this exception:

java.net.SocketException: Connection reset
    at java.net.SocketInputStream.read(Unknown Source)
    at java.net.SocketInputStream.read(Unknown Source)
    at java.io.BufferedInputStream.fill(Unknown Source)
    at java.io.BufferedInputStream.read1(Unknown Source)
    at java.io.BufferedInputStream.read(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
    at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
    at app.model.mail.crawler.newimpl.FileLoader.getSourceOfSiteViaProxy(FileLoader.java:167)
    at app.model.mail.crawler.newimpl.FileLoader.process(FileLoader.java:220)
    at app.model.mail.crawler.newimpl.FileLoader.run(FileLoader.java:57)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

using this:

final HttpURLConnection httpUrlConnetion = (HttpURLConnection) website.openConnection(proxy);
httpUrlConnetion.setDoOutput(true);
httpUrlConnetion.setDoInput(true);
httpUrlConnetion.setRequestProperty("Content-type", "text/xml");
httpUrlConnetion.setRequestProperty("Accept", "text/xml, application/xml");
httpUrlConnetion.setRequestMethod("POST");
httpUrlConnetion.connect();

i am able to download the source of a site which is html, but not a file maybe someone could help me with the properties i have to set for downloading a file.

Exagon
  • 4,798
  • 6
  • 25
  • 53
  • If you just need to set your proxy settings, see [this document](https://docs.oracle.com/javase/6/docs/technotes/guides/net/proxies.html) from Oracle, or if you want to cut to the chase, [this old StackOverflow question](http://stackoverflow.com/questions/120797/how-do-i-set-the-proxy-to-be-used-by-the-jvm). – Eric Galluzzo Dec 16 '15 at 12:56
  • System properties would not work because i wanna use a different proxy in every thread the download is executed. so i have to set the proxy to each connection – Exagon Dec 16 '15 at 12:57
  • The Oracle document above specifies how to do this. I've added an answer with some sample code. – Eric Galluzzo Dec 16 '15 at 13:00
  • doesnt work for me it gives me an Exception – Exagon Dec 16 '15 at 14:08
  • maybe it's not a problem of the proxy. See http://stackoverflow.com/questions/585599/whats-causing-my-java-net-socketexception-connection-reset for a list of causes of this error. My guess is a timeout, and to check that try to download a very small file. – malarres Jan 11 '16 at 13:37

4 Answers4

6

To set a proxy programmatically:

SocketAddress addr = new InetSocketAddress("my.proxy.com", 8080);
Proxy proxy = new Proxy(Proxy.Type.HTTP, addr);
URL url = new URL("http://my.real.url.com/");
URLConnection conn = url.openConnection(proxy);

Then you can use your code above with the URLConnection returned on the last line. You can also use a SOCKS proxy, or force no proxy, if you so desire.

This was taken (and slightly edited) from this Oracle documentation.

Eric Galluzzo
  • 3,191
  • 1
  • 20
  • 20
  • if i am doing it like this i am getting an Exception see my question again i will edit it – Exagon Dec 16 '15 at 14:04
  • Unfortunately it's difficult to tell why the connection would be reset in your case. Have you tried accessing the URL in a browser, with the same proxy settings, and ensured that it works there? Are you using the right type of proxy (SOCKS vs. HTTP)? – Eric Galluzzo Dec 16 '15 at 14:18
  • i am using a SOCKS yes i did and it worked... i tried on a lot of other sites now but never worked – Exagon Dec 16 '15 at 14:20
  • Did you change the ```Proxy.Type.HTTP``` in the code to ```Proxy.Type.SOCKS```? You might try both just in case. – Eric Galluzzo Dec 16 '15 at 14:28
  • Hmmm, I'm not sure then. I assume you've verified your proxy host and port in your code. Other than that, I'm not sure what to suggest. :( – Eric Galluzzo Dec 16 '15 at 14:31
5

It is possible to use the library Apache httpclient that solves most of the issue with proxies. To compile the code below, you can use the following maven:

Maven:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>stackoverflow.test</groupId>
  <artifactId>proxyhttp</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>proxy</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.httpcomponents</groupId>
      <artifactId>httpclient</artifactId>
      <version>4.5.1</version>
    </dependency>
  </dependencies>
</project>

Java code:

import org.apache.http.HttpHost;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

/**
 * How to send a request via proxy.
 *
 * @since 4.0
 */
public class ClientExecuteProxy {

    public static void main(String[] args)throws Exception {
        CloseableHttpClient httpclient = HttpClients.createDefault();
        try {
            HttpHost target = new HttpHost("www.google.com", 80, "http");
            HttpHost proxy = new HttpHost("127.0.0.1", 8889, "http");

            RequestConfig config = RequestConfig.custom()
                    .setProxy(proxy)
                    .build();
            HttpGet request = new HttpGet("/");
            request.setConfig(config);

            System.out.println("Executing request " + request.getRequestLine() + " to " + target + " via " + proxy);

            CloseableHttpResponse response = httpclient.execute(target, request);
            try {
                System.out.println("----------------------------------------");
                System.out.println(response.getStatusLine());
                System.out.println(EntityUtils.toString(response.getEntity()));
            } finally {
                response.close();
            }
        } finally {
            httpclient.close();
        }
    }

}
Marco Altieri
  • 3,726
  • 2
  • 33
  • 47
  • I am getting a HTTP response code: 411, a read timeout or a connect timed out ... any ideas? – Exagon Jan 11 '16 at 20:34
  • @Exagon I have updated the code because last time I used a code that I wrote for an old version using classes that have been all deprecated. I retested the code using fiddler2 as a proxy. It worked fine. If you get a timeout it is probably a "netwrorking" issue. – Marco Altieri Jan 11 '16 at 22:41
  • By the way, the example is just a "copy and paste" of: https://hc.apache.org/httpcomponents-client-ga/httpclient/examples/org/apache/http/examples/client/ClientExecuteProxy.java – Marco Altieri Jan 11 '16 at 22:42
  • sorry but i am getting an error at request.setConfig(config); "The method setConfig(RequestConfig) is undefined for the type HttpGet" – Exagon Jan 12 '16 at 20:59
  • @exagon What version of the library are you using ? If you do not want to use maven, you can download the version that I used from: http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.5.1/httpclient-4.5.1.jar – Marco Altieri Jan 12 '16 at 21:37
  • the newest 4.5.1 my IDE is Eclipse Mars and I am using Java 8.65 – Exagon Jan 12 '16 at 21:39
  • mmm I see... I am not on JDK 8. Let me check – Marco Altieri Jan 12 '16 at 21:40
  • It worked for me on JDK 8. Is your error at runtime or compile time? – Marco Altieri Jan 12 '16 at 22:05
  • its a compile time error ... i dont know why ... also appears when I create a new project and just add the library and this class – Exagon Jan 12 '16 at 22:09
  • HttpGet has the method setConfig since the beginning. As I said, the example is from the apache httpclient site so it has to work. – Marco Altieri Jan 12 '16 at 22:53
5

The following is different from the other answers and works for me: set these properties before the connection:

            System.getProperties().put("http.proxySet", "true");
            System.getProperties().put("http.proxyHost", "my.proxy.com");
            System.getProperties().put("http.proxyPort", "8080"); //port is String, not int

Then, open the URLConnection and try to download the file.

Eduardo Poço
  • 2,819
  • 1
  • 19
  • 27
2

Another approach is to implement the proxy "inside" each instance of httpUrlConnection. That is:

  1. Do not connect to the real URL you want. First, connect to the proxy IP and port, but with the http GET method refering to the URL you want.
  2. Use the setRequestProperty to set the host to your URL's and any other header you may need.

If it works, the connection will transparently send the file to you.

I have some code that worked with Sockets.

try {
    Socket sock = new Socket("10.0.241.1", 3128); //proxy IP and port
    InputStream is = sock.getInputStream();
    OutputStream os = sock.getOutputStream();
    String str = "GET http://www.uol.com.br HTTP/1.1\r\n"; //GET your site
    str += "Host: www.uol.com.br\r\n"; //again, Host of your site
    str += "Proxy-Authorization: Basic ZWR1YXJkby5wb2NvOmM1NmQyMw==\r\n"; //if password is needed
    str += "\r\n";
    os.write(str.getBytes());
    byte[] bb = new byte[1024];
    int L = 0;
    while ((L = is.read(bb)) != -1) {
        //write bytes to file stream...
    }
} catch (Exception ex) {
    //exception handling...
}

"Why would somebody use pure sockets when one could use httpUrlConnection?", you say. Well, by that time, I didn't know about httpUrlConnection.

Eduardo Poço
  • 2,819
  • 1
  • 19
  • 27
  • could you show how to do this with all the propertys with some code? – Exagon Jan 12 '16 at 21:27
  • Edited in the answer. This implementation is from the time when I didn't know about httpUrlConnection, so used sockets. Did the edit in a hurry, I think you can figure out the equivalent operations on a httpUrlConnection. If you need, I'll edit it again to fit a httpUrlConnection. – Eduardo Poço Jan 13 '16 at 03:07