0

I want to find a string that begins with http:// and ends with .com. but the http:// and .com it doesn't need to be printed.

$str = "http://example.com";
$str =~ /http:\/\/example.com/;$result = "$&\n";
print $result; 

essentially the same as that done with python.

#!/usr/bin/python
import re
str = 'http://example.com'
search = re.search(r'http://(\w+).com', str)
if search:
  print search.group(1)

it will only show "example". How to do it in Perl?

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • Why was this question, as were many of its ostensibly correct answers, downvoted without comment? The OP may be weak on applied regexen, and regexen may be a poor tool for the job, but the question strikes me as quite legitimate. +1 to compensate. – pilcrow May 19 '12 at 03:10
  • @pilcrow Multiple question marks, leaning toothpicks, and not using capturing in the Perl version even though it is present in the Python version indicates a lack of effort to me. That's the reason for my `-1`. I improved the title, but there is no reason to reward posters who don't take the time to compose a decent question. – Sinan Ünür May 19 '12 at 03:38
  • @SinanÜnür, thank you for outlining your reasons. I agree, FWIW. As a separate matter, and IMHO, downvoting *without explanation* is the worse crime. Silent downvotes are merely punitive to the poster, whereas articulated downvotes are remedial. – pilcrow May 19 '12 at 03:54

4 Answers4

3

Robust solution with a specialised parser:

use feature 'say';
use strict; use warnings;

use URI;
use URI::Find;

URI::Find->new(sub {
    my $uri = shift;
    say $uri->host =~ m{(\w+)[.]com\z};
})->find(\ (my $x = q{http://example.com/}) );
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
daxim
  • 39,270
  • 4
  • 65
  • 132
  • i got some errors. Can't locate URI/Find.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl .) – user1070579 May 18 '12 at 10:23
  • 3
    From the [Stack Overflow Perl FAQ](http://stackoverflow.com/questions/tagged/perl?sort=faq): [What's the easiest way to install a missing Perl module?](http://stackoverflow.com/questions/65865/whats-the-easiest-way-to-install-a-missing-perl-module) – daxim May 18 '12 at 10:36
  • ① Not quite correct. As written (assuming `$text = 'http://example.com'`), this code prints "example.com1\n". You don't mean to assign the return value of `print` to `$host`. ② The call to `URI->new` is superfluous, as `$uri` is already a subclass of `URI`. ③ This approach *modifies* `$text`, which may or may not be acceptable but is certainly worth noting. – pilcrow May 19 '12 at 02:56
  • Update: Sinan Ünür has remedied the above. – pilcrow May 19 '12 at 03:57
0

Try this simple code:

$str = 'http://example.com'; 
print "$_\n" for $str =~ m{\A http:// (\w+) [.] com \z}x;

To ensure your result is complete, anchor the pattern at the beginning , \A, and end, \z. Use a different pattern delimiter than / to avoid the leaning toothpick syndrome, and use the x option to make your pattern more readable.

You need to use (...) to capture the part you want to extract.

You can test this code on ideone.com

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • @SinanUnur, your edit went well beyond merely substituting `\w+` for `.*`. Indeed, it broke the code, which was, if nothing else, at least correct for the contrived example given. (Now it fails to compile with *"Search pattern not terminated..."*) I'd hate to think this user was downvoted because of it. +1 to compensate. – pilcrow May 19 '12 at 03:01
  • @pilcrow You have enough rep to fix a simple typo in my edit. The edit msg pointed out the most important aspect of the edit. I still think @daxim's answer has the right idea, so I do not see the point of voting this answer up. While I did not vote it down either, I can see why someone might have thought using `URI` is better than fiddling with regex patterns. – Sinan Ünür May 19 '12 at 03:27
0

Not so perlish solution below:

$str = 'http://example.com';

if (($url) = $str =~ /http:\/\/(\w+)\.com/) {
    print $url, "\n";
}
kidig
  • 215
  • 2
  • 9
  • Why was this downvoted without comment? The answer is inelegant, but is at least as correct (and no more correct than) the OP's python example. +1 to compensate. – pilcrow May 19 '12 at 03:05
-1

In your Python snippet you're capturing the text you want with parentheses, but in your Perl snippet you've left them out. Also, the part you want to capture is hard-coded instead of expressed as \w+. Dig there.

Rory Hunter
  • 3,425
  • 1
  • 14
  • 16
w.k
  • 8,218
  • 4
  • 32
  • 55