4

I'd like to check tor before I start crawling using python scrapy. I am using polipo/tor/scrapy on linux.

with this settup scrapy correctly using tor on its crawls. The way I check if the scrapy using tor correctly is to crawl this page in myspider.

class mySpider(scrapy.Spider): 
    def start_requests(self):
         yield Request('https://check.torproject.org/', self.parse)

    def parse(self, response):
         logging.info("Check tor page:" + str(response.css('.content h1::text')))

However I think there might be a better/clean way of doing it. I know I can check tor service status or check ip address but I want to actually check whether tor connection is correctly established.

Community
  • 1
  • 1
PHA
  • 1,588
  • 5
  • 18
  • 37

1 Answers1

4

A somewhat definitive way to do this is to connect to Tor's control port and issue GETINFO status/circuit-established.

If Tor has an active circuit built, it will return:

250-status/circuit-established=1
250 OK

If Tor hasn't been used for a while, this could be 0. You can also call GETINFO dormant which would yield 250-dormant=1. Most likely when you then try to use Tor, it will build a circuit and dormant will become 0 and circuit-established will be 1 barring any major network issues.

In either case, dormant=0 or circuit-established=1 should be enough to tell you can use Tor.

It's a simple protocol so you can just open a socket to the control port, authenticate, and issue commands, or use Controller from Stem.

See the control spec for more info.

drew010
  • 68,777
  • 11
  • 134
  • 162
  • thanks for your reply, I just noticed twisted is using txtorcon and not stem so I guess scrapy is using txtorcon so I guess I should learn how to getinfo in txtorcon instead – PHA May 11 '16 at 08:48
  • Looks like it does [torcontrolprotocol.py](https://github.com/meejah/txtorcon/blob/master/txtorcon/torcontrolprotocol.py#L384). Might as well use that, but it's a very simple command based protocol with nothing special for simple commands like that. It does however get more complicated with parsing of certain results and knowing how to identify the end of a message. – drew010 May 11 '16 at 19:14
  • If I understand correctly `dormant` and `circuit-established` will return 1 and 0 respectively if Tor hasn't been used for a while. (1) Do these commands wake up Tor? (2) If not what would be the best way to wake it up? – nopara73 Jun 09 '17 at 04:45
  • Just asking the controller for that info doesn't wake up Tor. You can wake it with a `SIGNAL NEWNYM`, by trying to resolve an address `RESOLVE somehost.net`, or by opening a SOCKS connection and issuing a request (there will be some small lag up front to build circuits and establish the connections before sending the request). I can't say with absolute certainty but I don't think Tor can be dormant if you have established circuits since dormant is described as `zero if Tor is currently active and building circuits, and nonzero if Tor has gone idle due to lack of use or some similar reason`. – drew010 Jun 09 '17 at 16:16