4

Let's say I have a big RSS feed full of Twitter posts, and they are all plain text. Lots of the posts contain URLs, and I'd like those URLs to be turned into links.

So I've got a variable that is equal to:

Visualization of layoffs by industry, number and date. Looking forward to seeing similar for hiring trends. http://bit.ly/XBW4z

And I'd like it to turn into:

Visualization of layoffs by industry, number and date. Looking forward to seeing similar for hiring trends. http://bit.ly/XBW4z

How could I do that? I am useless when it comes to regex and its ilk, so help is much appreciated!

JimmyPena
  • 8,694
  • 6
  • 43
  • 64
Eileen
  • 6,630
  • 6
  • 28
  • 29

6 Answers6

2

Depends on what you want to match

A nice, simple regex is

http\://[a-zA-Z0-9./?&_\-]*

Which will match any url starting with http:// and containing only the characters in the [] - A through Z, 0 though 9, -, ., /, ?, &.

If you want to match other protocols (https, ftp, etc.), you can use

(http|ftp|anyotherprotocolyouwant)\://[a-zA-Z0-9./?&_\-]*

If you want to support more characters, simply add them to the [].

Update: forgot uppercase support! D'oh

configurator
  • 40,828
  • 14
  • 81
  • 115
  • What about digits or other valid characters? – Gumbo Mar 11 '09 at 19:01
  • OK, so that's the regex. How do I actually apply it to my chunk-of-text variable? Am I using regex, or preg_replace, or what? (I did not say I was useless with regex for nothing...) – Eileen Mar 11 '09 at 19:01
  • Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems... – configurator Mar 11 '09 at 19:03
  • 1
    HA! My thought is usually, "Oh crap, maybe this needs a regular expression" – Eileen Mar 11 '09 at 19:08
  • I don't think that this expression will handle query parameters properly. – Boden Mar 11 '09 at 19:11
2

Even I want one... Check the first link in the search result.Its pretty old!

and BTW look at the RHS.. we have similar questions. Recognize URL in plain text regex for url and image within a text or html

Community
  • 1
  • 1
Shoban
  • 22,920
  • 8
  • 63
  • 107
  • Wow, actually that's the first time I see someone who is not getting bashed for posting a Google link. – Tomalak Mar 11 '09 at 19:12
  • @Tomalak .. I have seen people shouting for not using google search! Show me the question which you are talking about. I want to see ;-) – Shoban Mar 11 '09 at 19:27
1

just to add some info ... check this class on phpclass.org, will solve ur problem ... this class will find the links and will convert them as well ...

http://www.phpclasses.org/browse/package/6114.html

1

OK, this question here (regex for url and image within a text or html) has a baffling title, but a helpful answer at the bottom. At least, it works for me and my cases!

$text = preg_replace('@(http://([\w-.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)@', 
                 '<a href="$1">$1</a>', $text);
Community
  • 1
  • 1
Eileen
  • 6,630
  • 6
  • 28
  • 29
  • But just in your case. URLs like `http://example.com/foo-bar` or `http://example.com/foo#bar` are not matched. – Gumbo Mar 11 '09 at 19:17
  • @Eileen: Hm... The "helpful answer at the bottom" has been voted -1. This is at least a hint that it could be flawed. – Tomalak Mar 11 '09 at 19:20
  • True, but as I said it works for all of my cases, AND except for Boden it the only answer in all of the proposed answers that actually shows how to perform the replacement in PHP. Giving me complicated (but perfect!) regex is useless without the PHP to make it work. – Eileen Mar 11 '09 at 20:46
1

Look at the preg_replace function. So something like this:

$regex_url = "((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)";

preg_replace($regex_url, '<a href="$1">$1</a>', $your_input_string);

Regular expression for URL taken from: http://www.geekzilla.co.uk/view2D3B0109-C1B2-4B4E-BFFD-E8088CBC85FD.htm

Boden
  • 4,149
  • 9
  • 43
  • 56
0

There are regular expressions that match valid URLs. For example the the complete regular expression for URLs, that’s derived from the grammar definition of URLs.

But it’s better to explicitly declare those than trying to find them. Because there are some situations in which it cannot be distinguished, if some characters are part of the URL or just text.

Gumbo
  • 643,351
  • 109
  • 780
  • 844