Getting the website title from a link in a string

Question

string: "Here is the badges, https://stackoverflow.com/badges bla bla bla"

If string contatins a link (see above) I want to parse the website title of that link.

It should return : Badges - Stack Overflow.

How can i do that?

Thanks.

score 6 · Accepted Answer · edited Apr 04 '11 at 09:00

6

#!/usr/bin/perl -w

require LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

my $response = $ua->get('http://search.cpan.org/');

if ($response->is_success) {
    print $response->title();
}
else {
    die $response->status_line;
}

See LWP::UserAgent. Cheers :-)

edited Apr 04 '11 at 09:00

daxim

39,270
4
65
132

answered Apr 03 '11 at 21:24

nc3b

15,562
5
51
63

Thank you and awesome but i need to catch that link :) Not that i can define that. If the string contains a link, then i need to get title of it. :) – wonnie Apr 03 '11 at 21:27
There are better regexes for this, but here's a simple, **flawed** example: `$str =~ m{(?http://\S*)};` – nc3b Apr 03 '11 at 21:47
I'd prefer `use` instead of `require`, as `use` is evaluated at compile-time; `require` is evaluated at run-time. – Dec 01 '14 at 23:14

score 6 · Answer 2 · answered Apr 03 '11 at 22:47

6

I use URI::Find::Simple's list_uris method and URI::Title for this.

answered Apr 03 '11 at 22:47

ysth

96,171
6
121
214

score 1 · Answer 3 · edited May 23 '17 at 11:55

Depending how the link is given and how you define title, you need one or other approach.

In the exact scenario that you have presented, getting the URL with URI::Find, HTML::LinkExtractor etc, and then my $title=URI->new($link)->path() will provide the title and the link.

But if the website title is the linked text like <a href="https://stackoverflow.com/badges"> badged</a>, then How can I extract URL and link text from HTML in Perl? will give you the answer.

If the title is encoded in the link itself and the link is the text itself of the link, how do you define the title?

Do you want the last bit of the URI before any query? What happens with the queries set as URL paths?
Do you want the part between the host and the query?
Do you want to parse the link source and retrieve the title tag if any?

As always going from trivial first implementation to cover all corner cases is a daunting tasks ;-)

Getting the website title from a link in a string

3 Answers3

Linked