2

string: "Here is the badges, https://stackoverflow.com/badges bla bla bla"

If string contatins a link (see above) I want to parse the website title of that link.

It should return : Badges - Stack Overflow.

How can i do that?

Thanks.

Community
  • 1
  • 1
wonnie
  • 459
  • 3
  • 6
  • 19

3 Answers3

6
#!/usr/bin/perl -w

require LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

my $response = $ua->get('http://search.cpan.org/');

if ($response->is_success) {
    print $response->title();
}
else {
    die $response->status_line;
}

See LWP::UserAgent. Cheers :-)

daxim
  • 39,270
  • 4
  • 65
  • 132
nc3b
  • 15,562
  • 5
  • 51
  • 63
  • Thank you and awesome but i need to catch that link :) Not that i can define that. If the string contains a link, then i need to get title of it. :) – wonnie Apr 03 '11 at 21:27
  • There are better regexes for this, but here's a simple, **flawed** example: `$str =~ m{(?http://\S*)};` – nc3b Apr 03 '11 at 21:47
  • I'd prefer `use` instead of `require`, as `use` is evaluated at compile-time; `require` is evaluated at run-time. –  Dec 01 '14 at 23:14
6

I use URI::Find::Simple's list_uris method and URI::Title for this.

ysth
  • 96,171
  • 6
  • 121
  • 214
1

Depending how the link is given and how you define title, you need one or other approach.

In the exact scenario that you have presented, getting the URL with URI::Find, HTML::LinkExtractor etc, and then my $title=URI->new($link)->path() will provide the title and the link.

But if the website title is the linked text like <a href="https://stackoverflow.com/badges"> badged</a>, then How can I extract URL and link text from HTML in Perl? will give you the answer.

If the title is encoded in the link itself and the link is the text itself of the link, how do you define the title?

  1. Do you want the last bit of the URI before any query? What happens with the queries set as URL paths?
  2. Do you want the part between the host and the query?
  3. Do you want to parse the link source and retrieve the title tag if any?

As always going from trivial first implementation to cover all corner cases is a daunting tasks ;-)

Community
  • 1
  • 1
Pablo Marin-Garcia
  • 4,151
  • 2
  • 32
  • 50