In Perl, how can I extract part of the hostname from a URI?

Question

I want to find a string that begins with http:// and ends with .com. but the http:// and .com it doesn't need to be printed.

$str = "http://example.com";
$str =~ /http:\/\/example.com/;$result = "$&\n";
print $result;

essentially the same as that done with python.

#!/usr/bin/python
import re
str = 'http://example.com'
search = re.search(r'http://(\w+).com', str)
if search:
  print search.group(1)

it will only show "example". How to do it in Perl?

Why was this question, as were many of its ostensibly correct answers, downvoted without comment? The OP may be weak on applied regexen, and regexen may be a poor tool for the job, but the question strikes me as quite legitimate. +1 to compensate. — pilcrow, May 19 '12 at 03:10
@pilcrow Multiple question marks, leaning toothpicks, and not using capturing in the Perl version even though it is present in the Python version indicates a lack of effort to me. That's the reason for my `-1`. I improved the title, but there is no reason to reward posters who don't take the time to compose a decent question. — Sinan Ünür, May 19 '12 at 03:38
@SinanÜnür, thank you for outlining your reasons. I agree, FWIW. As a separate matter, and IMHO, downvoting *without explanation* is the worse crime. Silent downvotes are merely punitive to the poster, whereas articulated downvotes are remedial. — pilcrow, May 19 '12 at 03:54

score 3 · Answer 1 · edited May 19 '12 at 03:35

3

Robust solution with a specialised parser:

use feature 'say';
use strict; use warnings;

use URI;
use URI::Find;

URI::Find->new(sub {
    my $uri = shift;
    say $uri->host =~ m{(\w+)[.]com\z};
})->find(\ (my $x = q{http://example.com/}) );

edited May 19 '12 at 03:35

Sinan Ünür

116,958
15
196
339

answered May 18 '12 at 10:09

daxim

39,270
4
65
132

i got some errors. Can't locate URI/Find.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) – user1070579 May 18 '12 at 10:23
3

From the [Stack Overflow Perl FAQ](http://stackoverflow.com/questions/tagged/perl?sort=faq): [What's the easiest way to install a missing Perl module?](http://stackoverflow.com/questions/65865/whats-the-easiest-way-to-install-a-missing-perl-module) – daxim May 18 '12 at 10:36
① Not quite correct. As written (assuming `$text = 'http://example.com'`), this code prints "example.com1\n". You don't mean to assign the return value of `print` to `$host`. ② The call to `URI->new` is superfluous, as `$uri` is already a subclass of `URI`. ③ This approach *modifies* `$text`, which may or may not be acceptable but is certainly worth noting. – pilcrow May 19 '12 at 02:56
Update: Sinan Ünür has remedied the above. – pilcrow May 19 '12 at 03:57

score 0 · Answer 2 · edited May 19 '12 at 03:23

0

Try this simple code:

$str = 'http://example.com'; 
print "$_\n" for $str =~ m{\A http:// (\w+) [.] com \z}x;

To ensure your result is complete, anchor the pattern at the beginning , \A, and end, \z. Use a different pattern delimiter than / to avoid the leaning toothpick syndrome, and use the x option to make your pattern more readable.

You need to use (...) to capture the part you want to extract.

You can test this code on ideone.com

edited May 19 '12 at 03:23

Sinan Ünür

116,958
15
196
339

answered May 18 '12 at 12:48

Ωmega

42,614
34
134
203

@SinanUnur, your edit went well beyond merely substituting `\w+` for `.*`. Indeed, it broke the code, which was, if nothing else, at least correct for the contrived example given. (Now it fails to compile with *"Search pattern not terminated..."*) I'd hate to think this user was downvoted because of it. +1 to compensate. – pilcrow May 19 '12 at 03:01
@pilcrow You have enough rep to fix a simple typo in my edit. The edit msg pointed out the most important aspect of the edit. I still think @daxim's answer has the right idea, so I do not see the point of voting this answer up. While I did not vote it down either, I can see why someone might have thought using `URI` is better than fiddling with regex patterns. – Sinan Ünür May 19 '12 at 03:27

score 0 · Answer 3 · answered May 18 '12 at 15:28

0

Not so perlish solution below:

$str = 'http://example.com';

if (($url) = $str =~ /http:\/\/(\w+)\.com/) {
    print $url, "\n";
}

answered May 18 '12 at 15:28

kidig

215
2
9

Why was this downvoted without comment? The answer is inelegant, but is at least as correct (and no more correct than) the OP's python example. +1 to compensate. – pilcrow May 19 '12 at 03:05

score -1 · Answer 4 · edited May 18 '12 at 14:01

-1

In your Python snippet you're capturing the text you want with parentheses, but in your Perl snippet you've left them out. Also, the part you want to capture is hard-coded instead of expressed as \w+. Dig there.

edited May 18 '12 at 14:01

Rory Hunter

3,425
1
14
16

answered May 18 '12 at 10:06

w.k

8,218
4
32
55

In Perl, how can I extract part of the hostname from a URI?

4 Answers4