18

Updated my question

I'm building a crawler system by Java to compare price online. However, I worry about my IP address can be banned. So I intend to use proxy to change IP dynamic or use some tools to rotate IP automatically.

Many people said that TOR is a powerful tool to rotate IP. However, I don't know how to use Tor and how to integrate Tor to Java Web Application ?

I've search Google to find example but still find nothing useful.

Anyone can help me.

DimaSan
  • 12,264
  • 11
  • 65
  • 75
Leo Le
  • 815
  • 3
  • 13
  • 33
  • 3
    There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs. – CodingIntrigue Apr 15 '15 at 15:07
  • Hi @RGraham, I appreciate your comment. I've updated my question. Can you help me ? – Leo Le Apr 15 '15 at 16:40
  • 7
    Why close this question? for me is useful. – faster2b Sep 28 '15 at 12:39

1 Answers1

21

You'll just need to get Java to use the SOCKS4 proxy at localhost:8118 (8118 is the default Tor port) when it makes an outgoing HTTP connection that uses a URL (use URLConnection), while the Tor service is running. See here for how to use proxies in Java 8.

Edit: there is also this pure Java Tor library that you may be able to use, either directly or through minor modification (if it acts entirely like the normal native Tor service), but it hasn't been updated in a while so may not be compatible with the latest Tor specification.

HttpClient example:

HttpHost proxy = new HttpHost("127.0.0.1", 8118, "http");

DefaultHttpClient httpclient = new DefaultHttpClient();
try {
    httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);

    HttpHost target = new HttpHost("www.google.com", 80, "http");
    HttpGet req = new HttpGet("/");

    System.out.println("executing request to " + target + " via " + proxy);
    HttpResponse rsp = httpclient.execute(target, req);
    ...
} finally {
    // When HttpClient instance is no longer needed,
    // shut down the connection manager to ensure
    // immediate deallocation of all system resources
    httpclient.getConnectionManager().shutdown();
}

Note that you must have the Tor service running for this.

Chris Dennett
  • 22,412
  • 8
  • 58
  • 84
  • Please tick my answer as the best answer if it helped you :) – Chris Dennett Apr 15 '15 at 19:43
  • Hi @Chris Dennett, may I use Tor with HttpClient (instead of URLConnection) because I want to run multiple threads ? – Leo Le Apr 15 '15 at 23:28
  • Sure, see this question/answer: http://stackoverflow.com/questions/9811828/common-httpclient-and-proxy – Chris Dennett Apr 15 '15 at 23:38
  • Your answer will be the best if you give me some example by integrating Tor and HttpClient in Java, please. Thanks so much @Chris Dennett – Leo Le Apr 15 '15 at 23:52
  • 1
    How can I start Tor service ? I just find how to install Tor Browser on Linux (https://www.torproject.org/projects/torbrowser.html.en) – Leo Le Apr 16 '15 at 04:49
  • It should come with a command named `tor` that is installed when you install Tor Browser on Linux. You should have the user start this, or start it from within Java. Use `new ProcessBuilder("path_to_tor","param1","param2").start();` and it should run in the background, ready to accept connections (parameters are probably not needed, but included just in case) – Chris Dennett Apr 16 '15 at 10:02
  • Try `which tor` to see where it's been installed to, if it's locatable within `PATH` (it's always a good idea to specify the path when possible). Alternatively, try `find / -name tor`. – Chris Dennett Apr 16 '15 at 10:04
  • 1
    I've tried Orchid and it came up that **it does not supports fetching HTTP pages**: `com.subgraph.orchid.TorException: Returning HTTP page not implemented at com.subgraph.orchid.socks.SocksClientTask.sendHttpPage(SocksClientTask.java:112)` I understand that it's not because it does not support HTTP in general, but that it not exposes HTTP proxy but SOCKS proxy? So I still should try with forwarding apache httpclient through socks proxy – Piotr Müller Nov 15 '16 at 19:40
  • IDK Why on earth this non-working answer was accepted ! **tor is NOT HTTP Proxy!!** – Yahya Jun 27 '18 at 09:00
  • You're talking about the difference between SOCKS and HTTP proxy? It's not that far away, just use this: https://stackoverflow.com/questions/22937983/how-to-use-socks-5-proxy-with-apache-http-client-4. Tor can use either, but SOCKS is probably recommended. – Chris Dennett Jul 24 '18 at 01:25