4

I am converting an eregi_replace function I found to preg_replace, but the eregi string has about every character on the keyboard in it. So I tried to use £ as the delimiter.. and it is working currently, but I wonder if it might potentially cause problems because it is a non-standard character?

Here is the eregi:

function makeLinks($text) {  
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'<a href="\\1">\\1</a>', $text);
$text = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'\\1<a href="http://\\2">\\2</a>', $text);

    return $text;}

and the preg:

function makeLinks($text) {
    $text = preg_replace('£(((f|ht){1}tp://)[-a-zA-^Z0-9@:%_\+.~#?&//=]+)£i',
    '<a href="\\1">\\1</a>', $text);
    $text = preg_replace('£([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)£i',
    '\\1<a href="http://\\2">\\2</a>', $text);

        return $text;
}
Damon
  • 10,493
  • 16
  • 86
  • 144

5 Answers5

4

You can use parentheses to delimit a regex rather than a single character, for example:

preg_replace('(abc/def#ghi)i', ...);

That would probably be nicer than trying to find an obscure character that's not (yet) part of your expression.

Chris
  • 10,337
  • 1
  • 38
  • 46
4

£ is problematic because it isn't an ASCII character. It's from the Latin-1 charset and will only work if your PHP script also uses the 8bit representation. Should your file be encoded as UTF-8, then £ will be represented as two bytes. And PCRE in PHP will trip over that. (At least my version does.)

mario
  • 144,265
  • 20
  • 237
  • 291
2

You can use the unicode character, just to be sure.

\u00A3

Watch out for the ereg functions and unicode support.

http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/characters.html

Long live the Queen.

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
Brandon Frohbieter
  • 17,563
  • 3
  • 40
  • 62
2

As @Chris pointed out, you can use paired bracket characters as delimiters, but they have to properly balanced throughout the regex. For example, '<<>' won't work, but '<<>>' will. You can use any of (), [], {} or <>, but I recommend the braces or the square brackets; parentheses are too common in regexes, and angle brackets are used in escape sequences like (?>...) (atomic group) and (?<=...) (lookbehind).

But I'm with @Brad on this one: why not just escape the delimiter character with a backslash whenever it appears in the regex?

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
1

You would know the data being parsed better than we would. As far as regex is concerned, it's no different than any other ASCII value.

Though I have to ask: what's wrong with traditional then just escaping it? Or using a class with a character range?

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
  • I had escaping backwards originally (was trying to escape the delimiter instead of the occurrence of the delimiter in the expression... lol), but then i was more curious ove whether it would be a bad idea to use a character like that. – Damon Mar 06 '11 at 02:32