1

I have html/text string and want to match all links-like parts of the text to real hyperlinks with A tag. For this question I'am trying to match "www.somesite.domen" pattern. But, what if the pattern is near punctuation character at the end of the sentence.

How to match pattern without very last character which is punctuation?

  1. www.somesite.domen.
  2. www.somesite.domen,
  3. www.somesite.domen?
  4. www.somesite.domen!
  5. www.somesite.domen/?id=1?

Here is the function I'am using for:

function make_links($text)
{
  return  preg_replace(
     array(
        '/(^|\s)(www\.[^<>\s!,]+)(!$|\s|\.|\:|\!|,|\?)/iex'
       ),
     array(
        "stripslashes((strlen('\\2')>0?'\\1<a target=\"_blank\" href=\"http://\\2\">\\2</a>\\3':'\\0'))"
       ),
       $text
   );
}

But when the '.' or '?' characters are the last in sentence, my function is taking them into the link too.

Any idea how to solve this cases? Thanks!

Branislav
  • 315
  • 1
  • 3
  • 13

1 Answers1

0

If I understand your requirements correctly, you need to break your line of text into 3 groups

  • The first group will keep text before host name
  • The second group will keep the host name
  • The third group will last punctuation character (or whitespace character).

One of the solutions could be as follows:

/^(.*?)(www(?:.\w+)+(?:\/[^.\s]+?))(!$|\s|\.|\:|\!|,|\?)?$/

Regexp explained

Using some text www.host.some-site.domen/?id=1? as an example you would get matching as follows:

Matching results

In order to fiddle with your regexp you can use regex101.com

EDIT

Alternatively this is another regexp.

/^(.+\s)?(\w+(?:\.[-\w]+)+\.\w+(?:\/.*?)?)(!$|\s|\.|\:|\!|,|\?)?$/

I've performed several tests:

  • Test text: some stuff www.host.somesite.domen/?id=1.. Matching groups:

    • 1: some stuff,
    • 2:www.host.somesite.domen/?id=1,
    • 3:.
  • Test text: some stuff www.host.somesite.domain.. Matching groups:

    • 1:some stuff
    • 2:www.host.somesite.domen
    • 3:.
  • Test text: www.host.somesite.domain. Matching groups (only one):

    • 2: www.host.somesite.domain
  • Test text: hello www.host.somesite.domen/mysite.. Matching groups:

    • 1: hello,
    • 2:www.host.somesite.domen/mysite,
    • 3:.
  • Test text: www.somesite.domen/?id=1?. Matching groups:

    • 2:www.somesite.domen/?id=1
    • 3:?

I hope that will help to solve your problem.

Tom
  • 26,212
  • 21
  • 100
  • 111
  • Sorry, the `[ ]` are not the part of the text. I have just removed them from the cases. – Branislav Jan 16 '13 at 11:42
  • @Branislav, and what about host name, is it always www? – Tom Jan 16 '13 at 11:43
  • In this question link-like starts always with www. so I have to match as more cases as I can BUT without punctuation at the end. Host name is welcome too. As you know the '.' and '?' can be in URL but not at the very end. Example: " This is my website URL www.hostname.com/mysite. " – Branislav Jan 16 '13 at 12:08