2

Update

Working on a theory, I edited LWP/Protocol/http.pm to include a sleep statement in the subroutine request:

if (!$has_content || $write_wait || $has_content > 8*1024) {
  WRITE:
    {
        # Since this just writes out the header block it should almost
        # always succeed to send the whole buffer in a single write call.
        my $n = $socket->syswrite($req_buf, length($req_buf));
        sleep 2;   ## <----- NEW 
        unless (defined $n) {
        ...

And the get statement worked, returning a 200 OK. Much thanks for Alan Curry for help with debugging and finding this particular place in the code.

Not sure it completely answers the question, or if the solution works long term. Will have to do some more checking.

Summary:

  • LWP::UserAgent module using the get subroutine fails for some URLs, reporting 500 timeout.
  • Only some URLs fail. E.g. www.google.com fails, but www.google.se succeeds.
  • I have no other connection issues, all URLs are reachable with browser and through cmd programs such as ping.
  • Because of this problem, I cannot install modules for perl with CPAN or ActivePerl's ppm.
  • The problem persisted after installing another perl distribution.
  • Weirdly enough, using the debugger and stepping through the code makes the failing URLs succeed.
  • I am using a firewall, and perl is allowed to make connections. (Not relevant, since some URLs succeed)
  • Firewall log shows perl being allowed to connect for both failing URLs and non-failing. (See below) The log also shows sockets opening to listen, but timestamps are mismatched for failing connections.

Goal

  • I'm primarily looking for any solution to be able to install modules.
  • I'm interested in all suggestions on how to debug the problem, complete solutions not required. Any hints or tips are welcome.

Elaboration

I have been using ActivePerl v5.14 for some time. Installing modules with their Perl Package Manager ppm command and gui worked very well, but at some point stopped working, reporting a 500 timeout. The cpan shell reported the very same thing.

I have googled this problem extensively, but found nothing that relates to my problem, or helps in any way.

ActivePerl support claims it may be a proxy setting, which is ludicrous. I have lots of programs that connect to the internet that do not need proxy settings, and as far as I know, I do not need to do this. I have tried to find out what my proxy settings are, if any, but the only thing I have found is vague references such as "use the system settings", "no proxy required" and "proxy is the same as your IP".

So last night I had enough and installed strawberry perl instead, but it suffers from the same problem. I uninstalled ActivePerl afterwards.

Anyway, I have experimented with LWP modules and found that I can reproduce the errors there. It seems it is limited to certain websites, and cpan is one of them (?). I created this script for testing:

use strict;
use warnings;

use LWP::UserAgent;
use URI;

my $ua = LWP::UserAgent->new;
my $url = shift;
my $u = URI->new($url);
$ua->no_proxy('cpan.strawberryperl.com','cpan.com',$u->host);
$ua->timeout(30);
my $r = $ua->get($url);
if ($r->is_success) {
    print $r->decoded_content;
} else {
    die $r->status_line;
}

And then did some testing:

tx.pl http://cpan.strawberryperl.com/authors/01mailrc.txt.gz
500 read timeout at tx.pl line 23.

tx.pl http://stackoverflow.com
500 read timeout at tx.pl line 23.

tx.pl http://www.google.se
<!doctype html><html itemscope itemtype="http://schema.org/WebPage"><head><meta
http-equiv="content-type" content="text/html; charset=ISO-8859-1"><meta ...

So, google works, and www.youtube.com also works, but www.yahoo.com and search.cpan.com fails. The default timeout of 180 seconds makes this an incredibly annoying thing to debug, which is why I reduced it in my script. Needless to say, all of these URLs are reachable if I try to reach them with Firefox or ping.


ETA:

Strangely enough, running the script through the debugger, turning on trace and skipping to the end makes the previously failed connections successful.

It would seem to imply that there is some kind of hiccup, missed timing that is "fixed" when the script runs more slowly due to printing thousands of lines of trace code.


I could understand this issue as being a result of some ActivePerl module getting corrupted, but strawberry perl is using a completely different set of files, so it must be my system.

Why some sites work and some don't is baffling. I could understand that some sites like stackoverflow.com would protect themselves against potential bots, but why cpan would thwart its own package manager makes no sense.

I am using a firewall, and Perl has been allowed to make connections. My system is a rather old installation of Windows XP (~5 years). While running dual boot with Ubuntu I've never encountered this problem, which is another clue that it is not something to do with proxies.

I am well and truly stumped. If anyone could help me debug this, I would be very grateful.

The CPAN shell error messages below. The funny thing is, it says it tries to use the ftp as a last resort, but I just discovered that the ftp command has not been allowed by my firewall, and if it was used, it should have asked me for permission.

Fetching with LWP:
http://cpan.strawberryperl.com/authors/01mailrc.txt.gz
LWP failed with code[500] message[read timeout]
Warning: no success downloading 'D:\strawberry\cpan\sources\authors\01mailrc.txt
.gz.tmp1252'. Giving up on it.
Fetching with LWP:
http://www.cpan.org/authors/01mailrc.txt.gz
LWP failed with code[500] message[read timeout]
Warning: no success downloading 'D:\strawberry\cpan\sources\authors\01mailrc.txt
.gz.tmp1252'. Giving up on it.
Warning: no success downloading 'D:\strawberry\cpan\sources\authors\01mailrc.txt
.gz.tmp1252'. Giving up on it.

As a last resort we now switch to the external ftp command 'C:\WINDOWS\system32\
ftp.EXE'
to get 'D:\strawberry\cpan\sources\authors\01mailrc.txt.gz.tmp1252'.

Doing so often leads to problems that are hard to diagnose.

If you're the victim of such problems, please consider unsetting the
ftp config variable with

    o conf ftp ""
    o conf commit

Please check, if the URLs I found in your configuration file
(http://cpan.strawberryperl.com/, http://www.cpan.org/) are valid. The
urllist can be edited. E.g. with 'o conf urllist push ftp://myurl/'

Could not fetch authors/01mailrc.txt.gz

Firewall log for trying to fetch non-failing URL (www.google.se) and failing (stackoverflow.com):

2012-06-27T18:34:04+01:00,info,appl control,C:\WINDOWS\system32\svchost.exe,allow,listen,17,0.0.0.0,56564
2012-06-27T18:34:04+01:00,info,appl control,C:\WINDOWS\system32\svchost.exe,allow,send,17,195.54.122.198,53
2012-06-27T18:34:13+01:00,info,appl control,D:\strawberry\perl\bin\perl.exe,allow,connect out,6,64.34.119.12,80
2012-06-27T18:34:13+01:00,info,appl control,D:\strawberry\perl\bin\perl.exe,allow,connect out,6,64.34.119.12,80
2012-06-27T18:34:21+01:00,info,appl control,C:\Program\Mozilla Firefox\firefox.exe,allow,connect out,6,74.86.70.106,80
2012-06-27T18:34:28+01:00,info,appl control,C:\Program\Mozilla Firefox\firefox.exe,allow,connect out,6,64.34.119.12,80
2012-06-27T18:34:30+01:00,info,appl control,C:\WINDOWS\system32\svchost.exe,allow,listen,17,0.0.0.0,56664
2012-06-27T18:34:30+01:00,info,appl control,C:\WINDOWS\system32\svchost.exe,allow,send,17,195.54.122.198,53
2012-06-27T18:34:30+01:00,info,appl control,D:\strawberry\perl\bin\perl.exe,allow,connect out,6,74.125.143.94,80
2012-06-27T18:34:30+01:00,info,appl control,D:\strawberry\perl\bin\perl.exe,allow,connect out,6,74.125.143.94,80
2012-06-27T18:35:14+01:00,info,appl control,C:\Program\Mozilla Firefox\firefox.exe,allow,connect out,6,64.34.119.12,80
2012-06-27T18:35:21+01:00,info,appl control,C:\Program\Mozilla Firefox\firefox.exe,allow,connect out,6,74.86.70.106,80
2012-06-27T18:36:21+01:00,info,appl control,C:\Program\Mozilla Firefox\firefox.exe,allow,connect out,6,74.86.70.106,80
2012-06-27T18:37:04+01:00,info,appl control,C:\WINDOWS\system32\svchost.exe,allow,listen,17,0.0.0.0,61215
2012-06-27T18:37:04+01:00,info,appl control,C:\WINDOWS\system32\svchost.exe,allow,send,17,195.54.122.198,53
2012-06-27T18:37:07+01:00,info,appl control,D:\strawberry\perl\bin\perl.exe,allow,connect out,6,64.34.119.12,80
2012-06-27T18:37:07+01:00,info,appl control,D:\strawberry\perl\bin\perl.exe,allow,connect out,6,64.34.119.12,80
Community
  • 1
  • 1
TLP
  • 66,756
  • 10
  • 92
  • 149
  • http://stackoverflow.com/questions/9379446/perl-cpan-module-install-broken – Ωmega Jun 25 '12 at 18:28
  • What will CPAN do if you locate LWP package and rename it for a while, so it will become unavailable? Will be LWP reinstalled? – Ωmega Jun 25 '12 at 18:30
  • @user1215106 The LWP module files should be just fine, I installed them fresh yesterday. That other question you linked concerns another error message. The link from daxim was interesting, but contained no suggested solutions. – TLP Jun 25 '12 at 18:36
  • @user1215106 Yes, that is the link I mentioned. It does not, however, contain any solutions. And it would not explain why some sites work and some don't. – TLP Jun 25 '12 at 18:41
  • see http://stackoverflow.com/questions/6611382/how-to-install-module-strawberry-perl-issues – Ωmega Jun 25 '12 at 18:42
  • @user1215106 I cannot find out what my proxy is. The internet settings in my browser simply says "use system settings". Either it is a well-guarded secret, or I do not have a proxy. – TLP Jun 25 '12 at 18:58
  • Try **cpan>** `o conf init` or check your `.../CPAN/MyConfig.pm` – Ωmega Jun 25 '12 at 19:10
  • based on http://search.cpan.org/dist/CPAN/lib/CPAN/FirstTime.pm it may be `Config.pm` or `MyConfig.pm` – Ωmega Jun 25 '12 at 19:17
  • 2
    Since you mention it works fine with Ubuntu on the same network, it's likely to be something about your Windows installation. Have you tried turning off Windows Firewall and seeing if that makes a difference? You also could look into `Net::Config` or `.libnetrc` which contain the configuration options for various Perl network modules. Find it with `perldoc -l Net::Config`. – Schwern Jun 25 '12 at 19:26
  • 1
    @Schwern Turning off the firewall did not change anything. I am afraid I would not know anything about those configuration options, and it still would not explain why some sites can be contacted and some not. – TLP Jun 25 '12 at 19:37
  • Are you able to access these URLs in a web browser? – brian d foy Jun 25 '12 at 23:11
  • Yes, like I said "Needless to say, all of these URLs are reachable if I try to reach them with Firefox or ping." – TLP Jun 26 '12 at 02:17
  • Use the debugger to find the different code path for working/non-working sites. I assume that resolving hostnames works fine. – daxim Jun 26 '12 at 08:40
  • @daxim I'm afraid I don't know much about networks or the debugger. I just single stepped through thousands of lines of code in the debugger with an address I expected to fail (`http://www.google.com`), and when I got to the end, it DIDN'T fail. But running it again without stepping slowly through the code did fail. Re: resolving hostnames: I can use `nslookup` on all the URLs mentioned. – TLP Jun 26 '12 at 18:11
  • 1
    These sort of [Heisenbugs](http://enwp.org/Heisenbug) are nigh impossible to diagnose remotely. Go find an expert and sit him/her in front of your computer. – daxim Jun 26 '12 at 18:14
  • @briandfoy Sorry, forgot to add your name to my response.. Yes, I have no problem reaching any site with my browser. Also, a simple ping request goes through without delay. – TLP Jun 26 '12 at 18:18
  • @daxim I just wish there was a way to observe what happens more closely. The difference between google.com and google.se seems to imply that added levels of redirection causes the timeout. As for experts: You're the only ones I have. =P – TLP Jun 26 '12 at 18:22
  • @daxim Funny enough, I found a way to successfully connect: I run the script in the debugger with trace = on. After hitting "c" and scrolling through to the end, it seems to have succeeded. Works with all the URLs that previously failed. – TLP Jun 26 '12 at 19:05
  • I have noticed flakiness in LWP similar to this in the past. Could be a bug or maybe unconventional protocol implementation that forces an edge case you your NIC driver. (I run into this kind of thing all the time with OpenGL and graphics card drivers.) You might get some insight by using WireShark to watch both a successful connect with tracing turned on and an unsuccessful one with it off. At least this will tell you at what stage of the protocol things are dying. – Gene Jun 29 '12 at 02:53
  • @Gene Interesting tool. I never knew so much was going on. Unfortunately I've no idea how it works, and all this data doesn't tell me much. I can see it sending and receiving 200 OK from google.com, but perl still times out. – TLP Jun 29 '12 at 03:19
  • LWP has an online bug tracker https://rt.cpan.org/Public/Bug/Report.html?Queue=LWP-Online . If you post the trace for both a successful and unsuccessful connect, they should certainly be interested. – Gene Jun 29 '12 at 04:24
  • voted to delete my answer as it was definitely not related to your issue :( – Gergely Szilagyi Jul 01 '12 at 11:38
  • @TLP There might be a lot of irrelevant differences between the .com and .se connections. Can you get a pair of successful/unsuccessful on the same URL? Also, wireshark comes with a command called tshark that outputs text. `tshark -V -r yoursavedfile` should do it – Alan Curry Jul 01 '12 at 11:39
  • @AlanCurry It outputs a 1,3Mb text file.. lots of spam in there. Not pasteable by any means. – TLP Jul 01 '12 at 12:03
  • @TLP you can supply a filter rule to select only the interesting parts, like `tshark -V -r yoursavedfile -f 'host 74.125.225.223'` or you can use the main wireshark program to select a single stream and save it. The idea was upload the files and give us a link, not a text paste. Your desire to censor them first was unexpected. – Alan Curry Jul 01 '12 at 12:14
  • @AlanCurry I did not want to post my IP. I can find an option that apparently applies a filter `tcp.stream eq 0` (0,1,2) which looks like it cuts out a lot of spam, but perhaps too much. Cannot apply the filter you supplied to a capture file. – TLP Jul 01 '12 at 12:30
  • @TLP yes those would be good, to separate the streams and cut out all the noise. A successful GET of the google home page shouldn't be very many packets, and a timeout failure even less. – Alan Curry Jul 01 '12 at 12:46
  • @AlanCurry Ok, this is what I managed to get out of the trace: http://www.sendspace.com/file/6mwztf No idea if this file upload site works, I just picked one at random. – TLP Jul 01 '12 at 13:39
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13269/discussion-between-alan-curry-and-tlp) – Alan Curry Jul 01 '12 at 13:47
  • Given the problems seem to happen more often with sites using CDNs or Akamai service, I would suspect this could be a networking issue with MTU size and packet fragmentation. Your browser may cope better with fragmentation/retries than LWP does. Try [testing your MTU limit](http://www.sevenforums.com/tutorials/94721-mtu-limit-test-change-your-connections-mtu-limit.html) to see if this is the culprit. Are you also connecting over a wireless network? If so, the details of the router may be relevant; if possible you could try connecting wired to see if that resolves (again, suspecting MTU). – Stennie Jul 03 '12 at 11:59
  • @Stennie I've already had a guy leave an answer about MTU (he deleted it). I replied that I have a 100mbit connection, and he said that made it unlikely to be an MTU problem. – TLP Jul 03 '12 at 15:27
  • @TLP .. did you actually try testing the MTU size (ping with no fragmentation)? Would be good to rule that out. Unlikely doesn't mean unpossible ;-). – Stennie Jul 03 '12 at 21:14
  • @Stennie I did not have any matching options in netsh, but the ping test maxed out at 1472 (1473 failed). – TLP Jul 03 '12 at 21:56
  • @TLP .. given the default MTU is 1500, it seems likely this is in fact your problem. Try [changing the MTU](http://my.bergersoft.net/2010/05/13/how-to-change-mtu-size-on-windows-xpvista72008/) to 1472 and re-testing LWP. – Stennie Jul 03 '12 at 23:35
  • @Stennie My MTU is already 1472... why would I change it to the same value? – TLP Jul 04 '12 at 00:13
  • @TLP: ok, if your MTU for Windows is already set to 1472 then guess that path of investigation is dead. Thought you may only have been testing with the `ping -f -l #` command to find the max. – Stennie Jul 04 '12 at 00:25
  • @Stennie That's what I already told you. =P The ping maxed out at 1472. – TLP Jul 04 '12 at 00:37
  • @Stennie I managed to do some juggling and find the setting, using the command `netsh interface ip show interface`. It says I have three connections, much like what is showing up in Network (from control panel icon). One called Loopback with MTU 32768, one wireless (disabled) and one that seems to be the one I am using, with MTU = 1500. – TLP Jul 04 '12 at 00:59
  • @TLP: so to be clear, the ping maxed out at 1472 *and* you checked that your MTU is actually set to 1472 as well? I'm suggesting changing your network MTU based on the results of the ping test, unless it is already at 1472. – Stennie Jul 04 '12 at 01:00
  • @Stennie The ping maxed out at 1472, the MTU is set to 1500. Checking the microsoft documentation it says 1500 is the default for my connection. Also, it says that changing the MTU is fairly complicated. – TLP Jul 04 '12 at 01:05
  • @TLP: since you are using Windows XP, try [Dr. TCP] (http://www.dslreports.com/drtcp) as an easier way to change the MTU. You should reboot after changing the MTU. – Stennie Jul 04 '12 at 03:53
  • Wow this thing is still going... you might get more people to look at it if the test case wasn't so difficult to understand. Can you write a small script using IO::Socket::INET instead of LWP, reproducing the effect by imitating LWP's HTTP request? – Alan Curry Jul 04 '12 at 04:01
  • @TLP .. actually realised that 1472 is the expected setting, as there are 28 bytes in the network header (so 1500 MTU). Will delete my answer, as it doesn't resolve. – Stennie Jul 04 '12 at 05:42

1 Answers1

0

This might not be a complete solution to your problem. But here it is anyway:

From your "detailed" problem description it looks like it's a problem with your desktop/laptop. Even though your firewall allows connections to websites as you mentioned, "FTP" might not be allowed by the Windows internal firewall.

Usually, ports 20 (FTP command port) and 21 (FTP data port) should have been added to the firewall exceptions (In Windows - StartSettingsControl Panel → Click on Security CenterFirewallExceptions (tab)Add ports. You can try adding ports 20 and 21 to the exceptions.

However, if you are connected to a router, you may have to port forward ports 20 and 21. However, these ports are forwarded by default, or if you are in a corporate VPN then it's a whole different story. Corporate VPNs, mostly restrict port 21 explicitly however allow port 22 (which is a secured version of port 21, for SFTP). Under such circumstances you may want to use ftp_proxy.

Alternatively (if you don't want to add port 20 and 21 to exception), you can go to the cpan prompt and use an ftp_proxy by:

cpan> o conf ftp_proxy http://your.ftpproxy.com

and then issue the install <module> command. Or you can update your ../CPAN/config.pm file to make permanent changes to the ftp_proxy parameter.

Well, these may be the traditional solutions which you probably already tried. The next step would be to try set the FTP_PASSIVE mode to 1. By default the libnetcfg configuration for this is set to 0. To change this, find the libnetcfg.bat file (it should be somewhere C:\Perl\bin). Open the file in an editor and replace

ftp_int_passive      0

with

ftp_int_passive      1

This is the Windows batch file that runs once CPAN is invoked to set the environment variables. Under a UNIX/Linux-like architecture it's found as libnet.cfg and environment variable FTP_PASSIVE, like

$set | grep FTP_PASSIVE
FTP_PASSIVE=0

so to set just EXPORT FTP_PASSIVE=1.

These might be a few of the very many ways of debugging this. Honestly, there is no point fiddling around the library code as they well work on every other machine, usually 95% of 01mailrc.txt.gz.tmp1252 download issues are due to network/OS/firewall issues, but if you want to expand your Perl knowledge of LWP you can. In fact, you should also be looking at CPAN::FTP::netrc. Best of luck...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Anjan Biswas
  • 7,746
  • 5
  • 47
  • 77