0

I tried to download files from google drive using wget (on windows) using the script shown here: [JULY 2020 - Windows users batch file solution]wget/curl large file from google drive.

It works well, but when computer is behind proxy, it will work ONLY if I will set environment variables PROXY_HTTP and PROXY_HTTP (It may be that it can also set by flag in the command, but I didn't try it)

The fact that I can download files from google drive using Chrome without these environment variables and without setting chrome for the proxy gives me the understanding that there is a way to download behind proxy without setting an application for the proxy.

How can I make wget works without need to set it manually (by flag or by environment variables) for the proxy?

Bruno Rohée
  • 3,436
  • 27
  • 32
audi02
  • 559
  • 1
  • 4
  • 16

2 Answers2

1

In all likelyhood your Chrome also has a proxy set up in some way. In any case, the simplest way to define the proxy for wget is to create a .wgetrc file in your local home folder and set the following:

> vi ~/.wgetrc

use_proxy=on
http_proxy=http://[proxy_ip]:[proxy_port]
https_proxy=https://[proxy_ip]:[proxy_port]
ftp_proxy=http://[proxy_ip]:[proxy_port]

That should be all you need to do.

Wilmar
  • 558
  • 1
  • 5
  • 16
  • Thanks, may it be that chrome "knows" automatically the proxy ip/ proxy port ? – audi02 Aug 13 '20 at 19:34
  • If you are on Windows, go to Settings --> Proxy. It could be the case that you have Automatically Detect Settings (which I am not actually sure if they work on Chrome see [this](https://stackoverflow.com/questions/191023/how-does-windows-actually-detect-lan-proxy-settings-when-using-automatic-confi), you have a script or there is a manually configured proxy. – Wilmar Aug 14 '20 at 16:06
0

I found the solution after @Wilmar comment which he pointed out here (thanks!).

An application can automatically finds if it is behind a proxy by sending "http://wpad/wpad.dat". If a proxy server is behind, it will answer with a message that contains PAC file with proxy details. The application then can extract the proxy details for any needed settings. Thats how Chrome can set itself for automatically for proxy.

Example using wget in windows to find proxy details

In Windows, you can use wget as follows to get the proxy server details. The details must be extracted from the text messages and you can use tool like jrepl for such task. Here I only show where the details are.

call wget "http://wpad/wpad.dat" -o "ProcessLog.txt" -O "PAC.txt"

There are three possible scenarios here:

  1. In case there is no proxy behind, then PAC.txt is empty and ProcessLog.txt contains text message similar to this one.

ProcessLog.txt

--2020-09-01 08:38:29--  http://wpad/wpad.dat
Resolving wpad (wpad)... failed: The requested name is valid, but no data of the requested type was found. .
wget: unable to resolve host address 'wpad'
  1. In case there is a proxy server behind, and windows environment variables for proxy are set:

http_proxy=http://proxy.mc.company.com:777

https_proxy=https://proxy.mc.company.com:777

then wget already knows the proxy address so PAC.txt is empty and ProcessLog.txt contains text message similar to the follow one that contains the proxy details. In this example, the proxy details are [proxy_ip]:[proxy_port] = proxy.mc.company.com:777

ProcessLog.txt

--2020-09-01 08:29:59--  http://wpad/wpad.dat
Resolving proxy.mc.company.com (proxy.mc.company.com)... 10.100.200.150
Connecting to proxy.mc.company.com (proxy.mc.company.com)|10.100.200.150|:777... connected.
Proxy request sent, awaiting response... 302 Found
Location: http://www.wpad.com/wpad.dat [following]
--2020-09-01 08:30:00--  http://www.wpad.com/wpad.dat
Connecting to proxy.mc.company.com (proxy.mc.company.com)10.100.200.150|:777... connected.
Proxy request sent, awaiting response... 403 Forbidden
2020-09-01 08:30:00 ERROR 403: Forbidden.
  1. In case there is a proxy server behind, but no windows environment variables for proxy are set, then wget gets the proxy details from proxy server. In this case PAC.txt contains long text message similar to the follow one that contains the proxy details. In this example, the proxy details are [proxy_ip]:[proxy_port] = proxy.mc.company.com:777

PAC.txt

function FindProxyForURL(url,host) {
var me=myIpAddress();
var resolved_ip = dnsResolve(host);
if (host == "127.0.0.1") {return "DIRECT";}
if (host == "localhost") {return "DIRECT";}
if (isPlainHostName(host)) {return "DIRECT";}
if (url.substring(0,37) == "http://lyncdiscoverinternal.company.com") {return "DIRECT";}
if (!resolved_ip) { if (url.substring(0,6) == "https:") {return "PROXY proxy-mc.company.com:778";} else {return "PROXY proxy-mc.company.com:777";}}
if (host == "moran-for-localhost-only.com") {return "DIRECT";}
...
...

Simplifying using wget in windows to find proxy details

When using wget to find proxy details, we can command it to ignore proxy environment variables (if are set) using the flag --no-proxy. This leaves us with only two possible scenarios (1) and (3) described above. So we just need the ProxyInfo file. If it is empty (scenario 1) then no proxy is behind, if it contains text (scenario 3), it is behind a proxy and you can extract the proxy details from it.

call wget --no-proxy "http://wpad/wpad.dat" -O "PAC.txt"
audi02
  • 559
  • 1
  • 4
  • 16