1

I am trying to download some xml files from a given URL. Below is the code which I have used for the same-

use strict;
use warnings;

my $url ='https://givenurl.com/';
my $username ='scott';
my $password='tiger';

system("wget --user=$username --password=$password $url") == 0 or die "system execution failed ($?): $!";
local $/ = undef;
open(FILE, "<index.html") or die "not able to open $!";
my $index = <FILE>;
my @childs = map /<a\s+href\=\"(AAA.*\.xml)\">/g , $index;

for my $xml (@childs)
{
  system("wget --user=$username --password=$password $url/$xml");
}

But when I am running this, it gets stuck in the for-loop wget command. It seems wget is not able to fetch the files properly? Any clue or suggestion?

Thank you.

Man

Man
  • 131
  • 1
  • 7
  • 2
    See also: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Nicholas Knight Mar 14 '11 at 11:58
  • 2
    Have you tried replacing `system` with `print`? Does it get stuck on the first `system`? What do the `system`s return? – Tim Mar 14 '11 at 13:29
  • It gets stuck after fetching several files.. sometimes it fetches two files sometimes at last file.. – Man Mar 14 '11 at 13:38

3 Answers3

3

You shouldn't use an external command in the first place. Ensure that WWW::Mechanize is available, then use code like:

use strict;
use warnings;

use WWW::Mechanize;

my $mech = WWW::Mechanize->new();

...

$mech->credentials($username, $password);
$mech->get($url);
foreach my $link ($mech->find_all_links(url_regex=>qr/\bAAA/)) {
    $mech->get($link);
    ...
}
mscha
  • 6,509
  • 3
  • 24
  • 40
1

If $url or $xml contains any shell metacharacters (? and & are common ones in URLs) then you may need to either quote them properly

system("wget --user=$username --password=$password '$url/$xml'");
system qq(wget --user=$username --password=$password "$url/$xml");

or use the LIST form of system that bypasses the shell

system( 'wget', "--user=$username", "--password=$password", "$url/$xml");

to get the command to work properly.

mob
  • 117,087
  • 18
  • 149
  • 283
0

maybe it's because the path to wget, what if you use:

system("/usr/bin/wget --user=$username --password=$password $url")

or I guess it can be a problem with variables passed to system: ($username, $password, $url)

Juan
  • 1,520
  • 2
  • 19
  • 31
  • It works for several files but not for other. Moreover it gets stuck randomly when I am executing the script multiple times.. :( – Man Mar 14 '11 at 14:20
  • Try changing the `system` by `print` and see the output ... maybe the `$url` is bad formated, if that is the case then you could use the `URL` module to build the `$url`... good luck – Juan Mar 14 '11 at 17:04