0

How do I do a replace that will turn all plain addresses in a paragraph in to links?

The problem is periods are valid in the address, but the address may be at the end of a sentence.

Test string:

The link is: http://www.google.com/pants.  And that is the link.  

I need to group out from http to pants. (It is plain text and I need to make it html.)

This regex grabs the period after pants and so is bad:

(^|[\n ])([\w]+?://[^\s]*)

I'm pretty sure I need to do a lookahead, but I can't put that in the ^\s character set. Trying to do an ifthenelse has also eluded me.

Here is my output thingy:

$1<a href=\"$2\" target=\"_blank\">$2</a>

Hey, people reading this. Make sure you know this site: http://gskinner.com/RegExr/ It rules. It's the only reason I get any regex right.

CBGraham
  • 1,368
  • 1
  • 13
  • 14

2 Answers2

1

Assuming no space in the urls and a space or end of string after them:

str = str.replace(
    /(https?:\/\/\S+?)(?=\.?(\s|$))/g,
        '<a href="$1" target="_blank">$1</a>' );

It captures 'http[s]://' and non-space characters as few times as possible until looking ahead there is optionally . and then a space or the end of string.

If you want to exclude other punctuation that may be at the end of an url you could change the positve lookahead accordingly, e.g. (?=[;:!,.]?(?:\s|$)).

Note that the above regex is not intended to only match valid urls, and you may want to replace the \S with [\w/.-] to only match urls containing word characters and .-/.

In search of the perfect URL validation regex

MikeM
  • 13,156
  • 2
  • 34
  • 47
  • Yay! Thank you so much. Let's see if I can dissect that for future generations. – CBGraham Jan 23 '13 at 17:56
  • I can't hit return? Frickin... hold on. – CBGraham Jan 23 '13 at 17:56
  • Just noticed the php preg-replace tag, and my answer uses Javascript - apologies. – MikeM Jan 23 '13 at 18:06
  • https? -- http with an optional s. :\/\/ -- literal text with escaped slashes. \S+? -- one or more non-whitespace, lazy (tight so the next bit will end the match) ?=[.,?!;:] -- If we are looking at punctuation ? -- the lookahead we just saw is part of an if statement () -- then clause \s|$ -- if there is whitespace or the end of the string then this is the end of the match. Is that pretty much right? (I can't have tabs or multiple spaces? Is this China?) – CBGraham Jan 23 '13 at 18:09
0

These some pattern for urls regular expression pattern:

^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$

^[a-zA-Z0-9\-\.]+\.(com|org|net|mil|edu|COM|ORG|NET|MIL|EDU)$

you can use them like this

str = str.replace(PATTERN/g,
        '<a href="$1" target="_blank">$1</a>' );

You'll find tons of them at http://regexlib.com/

Daniele B
  • 3,117
  • 2
  • 23
  • 46