3

im trying to extract an info hash from a torrent magnet link using perls regex
the magnet link looks like:

magnet:?xt=urn:btih:8AC3731AD4B039C05393B5404AFA6E7397810B41&dn=ubuntu+11+10+oneiric+ocelot+desktop+cd+i386&tr=http%3A%2F%2Ftracker.openbittorrent.com%2Fannounce

but sometimes it can look like:
magnet:?xt=urn:btih:8AC3731AD4B039C05393B5404AFA6E7397810B41

the part im trying to extract is 8AC3731AD4B039C05393B5404AFA6E7397810B41

im trying to capture everything upto the first '&' or if it only includes the infohash then upto the end of the line, ive tried a couple way but cant get it to work correctly
what i have below only captures the first character

if ($tmpVar =~ m/magnet\:\?xt=urn\:btih\:([[:alnum:]]+?)/i) {
  $mainRes{'hash'} = $1;
}

i also tried adding &|$ after the capture but that just results in an error
Thanks

Kr0nZ
  • 95
  • 3
  • 9
  • "just results in an error" -- this statement is next to useless. Instead say what the specific error is. – TLP Mar 01 '12 at 19:29

3 Answers3

4

You could use:

/\burn:btih:([A-F\d]+)\b/i

Or if the hash is always 40 chars:

/\burn:btih:([A-F\d]{40})\b/i
Qtax
  • 33,241
  • 9
  • 83
  • 121
2

As you've already discovered, you don't want to use the ? in your regular-expressions. Here's why:

The ? in pattern+? makes your regex "non-greedy", meaning it will try to use as few characters as possible while still matching the pattern you specify. So

"8AC3731AD4B039C05393B5404AFA6E7397810B41" =~ /(\w+?)/

just returns "8" while

"8AC3731AD4B039C05393B5404AFA6E7397810B41" =~ /(\w+)/

returns the whole string.

if ($tmpVar =~ m/magnet:\?xt=urn:btih:([[:alnum:]]+)/i) {
    $mainRes{'hash'} = $1;
}
mob
  • 117,087
  • 18
  • 149
  • 283
0

This is why the gods of CPAN gave us URI, to parse out parts of URIs, which you can then parse with a regex.

#!/usr/bin/perl
use URI;
use URI::QueryParam;
use Data::Dumper;

my $u = URI->new( shift() );
my $xt = $u->query_form_hash->{xt};

my ($hash) = $xt =~ m{^urn:btih:(.*)$};
print "$hash\n";

Presuming your magnet URI on the command line.

MkV
  • 3,046
  • 22
  • 16