0

What I would like to do is remove the "http://" part of these autogenerated links, below is an example of it.

http://google.com/search?gc...

Here are the regexes I am using in PHP to generate these links from a URL.

    $patterns_sp[5] = '~([\S]+)~';                          
    $replaces_sp[5] = '<a href=\1 target="_blank">\1<br/>';

    $patterns_sp[6] = '~(?<=\>)([\S]{1,25})[^\s]+~';        
    $replaces_sp[6] = '\1...</a><br/>';

When these patterns are run on a URL like this:

http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex

the REGEX gives me:

   <a href="http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex" target="_blank">http://google.com/search?gc...</a>

Where I am stuck:

There is no obvious reason why I cannot modify the fourth line of code to read like this:

    $patterns_sp[6] = '~(?<=\>http\:\/\/)([\S]{1,25})[^\s]+~';  

However, the REGEX still seems to capture the "http://" part of the address, thus making a long list of these very redundant looking. What I am left with is the same thing as in the first example.

Ryan Ward Valverde
  • 6,458
  • 6
  • 37
  • 48
  • 1
    please see [this answer](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). You need to use an HTML parser, not a regex to deal with HTML. – jb. Sep 23 '11 at 00:19

2 Answers2

1

Replace...

$patterns_sp[5] = '~([\S]+)~';                          

...with...

$patterns_sp[5] = '~^(?:https?|ftp):([\S]+)~';

Then you can access the protocol-less version with $1 and the whole link with $0.

Optionally, you can remove a leading protocol with something like...

preg_replace('/^(?:https?|ftp):/', '', $str);
alex
  • 479,566
  • 201
  • 878
  • 984
  • The only way I can think to do this is to create a new pattern. The way I have constructed it maintains the original address, while only modifying it between the and . While what you say would work, it seems like creating another expression could be burdensome. – Ryan Ward Valverde Sep 23 '11 at 00:08
1

I suggest not writing your own regex, instead have a look at http://php.net/manual/en/function.parse-url.php

Retrieve the components of the URL, then compose a new version that only contains the parts you want.

bradley.ayers
  • 37,165
  • 14
  • 93
  • 99