preg_replace to change url from relative to absolute

Question

My PHP code is:

$string = preg_replace('/(href|src)="([^:"]*)(?:")/i','$1="http://mydomain.com/$2"', $string);

It work with:

 - <a href="aaa/">Link 1</a> => <a href="http://mydomain.com/aaa/">Link 1</a>
 - <a href="http://mydomain.com/bbb/">Link 1</a> => <a href="http://mydomain.com/bbb/">Link 1</a>

But not with:

- <a href='aaa/'>Link 1</a>
- <a href="#top">Link 1</a> (I don't want to change if url start by #).

Please help me!

score 2 · Answer 1 · answered Oct 15 '13 at 07:51

How about:

$arr = array('<a href="aaa/">Link 1</a>',
             '<a href="http://mydomain.com/bbb/">Link 1</a>',
             "<a href='aaa/'>Link 1</a>",
             '<a href="#top">Link 1</a>');
foreach( $arr as $lnk) {
    $lnk = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://mydomain.com/$3"', $lnk);
    echo $lnk,"\n";
}

output:

<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="http://mydomain.com/bbb/">Link 1</a>
<a href="http://mydomain.com/aaa/">Link 1</a>
<a href="#top">Link 1</a>

Explanation:

The regular expression:

(?-imsx:(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    href                     'href'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    src                      'src'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  =                        '='
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    ["\']                    any character of: '"', '\''
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    #                        '#'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    http://                  'http://'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [^\2]*                   any character except: '\2' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  \2                       what was matched by capture \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

I can't figure out how to make this work in a greedy fashion for a multi-line blob of HTML. I've tried the 'm' modifier but no luck. Can you help? — Adam Friedman, Nov 10 '15 at 03:20
Thanks, the regex should be changed a bit to be ungreedy and to cover https too. `'~(href|src)=(["\'])(?!#)(?!https?://)/?([^\2]*?)\2~i'` — Jako, Jul 18 '17 at 22:51
@Jako: You're right for `https?` but `[^\2]*` doesn't need to be ungreedy because it is ungreedy by itself. — Toto, Jul 19 '17 at 08:39
Try your regex with two urls in one line: https://regex101.com/r/5Q8cye/1 — Jako, Jul 20 '17 at 12:21

score 0 · Answer 2 · answered Oct 15 '13 at 03:58

This will work for you

PHP:

function expand_links($link) {
    return('href="http://example.com/'.trim($link, '\'"/\\').'"');
}

$textarea = preg_replace('/href\s*=\s*(?<href>"[^\\"]*"|\'[^\\\']*\')/e', 'expand_links("$1")', $textarea);

I also changed the regex to work with either double quotes or apostrophes

gwillie · Answer 3 · 2013-10-15T05:35:44.453

0

try this for your pattern

/(href|src)=['"]([^"']+)['"]/i

the replacement stays as is

EDIT:

wait one...i didn't test on the first 2 link types, just the ones that didn't work...give me a moment

REVISISED:

sorry about the first regex, i forgot about the second example that worked with the domain in it

(href|src)=['"](?:http://.+/)?([^"']+)['"]

that should work

edited Oct 15 '13 at 05:35

answered Oct 15 '13 at 05:22

gwillie

1,893
1
12
14

preg_replace to change url from relative to absolute

3 Answers3

Linked