4

I have succeed to run Scrapy with Tor using this link: http://pkmishra.github.io/blog/2013/03/18/how-to-run-scrapy-with-TOR-and-multiple-browser-agents-part-1-mac/

But i couldn't run Splash with Tor.

In Scrapy-settings.py I directed to polipo for http_proxy(8123 is polipo port):

HTTP_PROXY = 'http://127.0.0.1:8123'

In polipo.config, I directed to tor(9150 is tor port):

socksParentProxy = localhost:9150

diskCacheRoot=""

Which works perfect for scrapy. In splash it doesn't work. But i have to say splash or docker to use polipo for http_proxy like in scrapy-settings.py. Docker should somehow use polipo, and polipo will direct to tor. How can i do that?

I run splash with:

sudo docker run -p 5023:5023 -p 8050:8050 -p 80511 scrapinghub/splash

and in etc/default/docker i tried docker should direct to polipo with this:

export http_proxy='http://127.0.0.1:8123'
Environment="http_proxy=http://127.0.0.1:8123"

But i couldn't succeed. What am i doing wrong? Thanks :)

Gallaecio
  • 3,620
  • 2
  • 25
  • 64

1 Answers1

4

You need to

  1. make Tor accessible from Splash Docker container;
  2. tell Splash to use this Tor proxy.

For (2) you can use either Splash proxy profiles or set proxy directly, either in proxy argument, or using request:set_proxy in splash:on_request callback a Lua script. For example, if Tor can be accessed from Splash Docker container as tor:8123, you can do a request like this:

http://<splash-url>:8050/render.html?url=...&proxy=socks5://tor:8123

Also, take a look at https://github.com/TeamHG-Memex/aquarium - it setups all of this - it sets up 'tor' proxy profile, starts Tor in another Docker container, and links these containers. To access remote website using Tor in a Splash deployed via Aquarium you can just add proxy=tor GET argument to a request:

http://<splash-url>:8050/render.html?url=...&proxy=tor
Mikhail Korobov
  • 21,908
  • 8
  • 73
  • 65
  • do I have to include the `'proxy': 'tor'` part in my request, like here: `yield SplashRequest(auction_results_url, self.parse_auction_list, endpoint='execute', args = {'lua_source': self.lua_script, 'proxy': 'tor'} ) ` I can't see in the logs if the requests are really going through tor – Zin Yosrim Jan 25 '18 at 00:57
  • @mikhail-korobov What about (1), how to run splash and tor in same docker container or how to make tor accessible from splash docker container? – Krishan Kumar Mourya Feb 19 '18 at 13:50
  • @KrishanKumarMourya it is a matter of configuring Docker. aquarium uses docker-compose to start several Docker containers at once (a few Splash containers, load balancer, tor) and connects them using "links" feature – Mikhail Korobov Feb 19 '18 at 14:48