The problem
I'm testing an HTTP proxy that is wrapping a SOCKS proxy (TOR). It works ok for normal URLs but I'm getting strange results with some .onion addresses.
In this example, I'm pointing at "the hidden wiki". The output looks like garbage:
$ curl --proxy localhost:8118 http://kpvz7ki2v5agwt35.onion/
m�AO�@�����ۑp��ĖPbj
Background
Using the torch hidden service works ok:
$ curl --proxy localhost:8118 http://xmh57jrzrnw6insl.onion/
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>TORCH: Tor Search!</title>...
Similarly, normal URLs seem ok:
$ curl --proxy localhost:8118 https://check.torproject.org/ | grep Congratulations
<img alt="Congratulations. Your browser is configured to use Tor." src="/images/tor-on.png">
Congratulations. Your browser is configured to use Tor.<br>
The proxy is created with polipo with the following configuration:
proxyName = "localhost"
proxyAddress = "127.0.0.1"
proxyPort = 8118
allowedClients = 127.0.0.1
allowedPorts = 1-65535
cacheIsShared = false
chunkHighMark = 67108864
socksParentProxy = "localhost:9050"
socksProxyType = socks5
diskCacheRoot = ""
localDocumentRoot = ""
disableLocalInterface = true
disableConfiguration = true
disableVia = true
dnsUseGethostbyname = yes
maxConnectionAge = 5m
maxConnectionRequests = 120
serverMaxSlots = 8
serverSlots = 2
tunnelAllowedPorts = 1-65535
Possible causes
My thoughts on a possible cause:
- The server responding with garbage as some kind of anti-web-crawler measure.
- There something wrong with the way I'm handling the response.
- Polipo is messing it up.
- Something else...
Thoughts?