Is it ok to use £ as delimiter in preg_replace?

Question

I am converting an eregi_replace function I found to preg_replace, but the eregi string has about every character on the keyboard in it. So I tried to use £ as the delimiter.. and it is working currently, but I wonder if it might potentially cause problems because it is a non-standard character?

Here is the eregi:

function makeLinks($text) {  
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'<a href="\\1">\\1</a>', $text);
$text = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'\\1<a href="http://\\2">\\2</a>', $text);

    return $text;}

and the preg:

function makeLinks($text) {
    $text = preg_replace('£(((f|ht){1}tp://)[-a-zA-^Z0-9@:%_\+.~#?&//=]+)£i',
    '<a href="\\1">\\1</a>', $text);
    $text = preg_replace('£([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)£i',
    '\\1<a href="http://\\2">\\2</a>', $text);

        return $text;
}

What a sacrilegious use of this holy symbol! The Queen shall be hearing about this. — Pekka, Mar 05 '11 at 23:35

score 4 · Answer 1 · answered Mar 05 '11 at 23:34

4

You can use parentheses to delimit a regex rather than a single character, for example:

preg_replace('(abc/def#ghi)i', ...);

That would probably be nicer than trying to find an obscure character that's not (yet) part of your expression.

answered Mar 05 '11 at 23:34

Chris

10,337
1
38
46

score 4 · Accepted Answer · answered Mar 06 '11 at 00:11

£ is problematic because it isn't an ASCII character. It's from the Latin-1 charset and will only work if your PHP script also uses the 8bit representation. Should your file be encoded as UTF-8, then £ will be represented as two bytes. And PCRE in PHP will trip over that. (At least my version does.)

score 2 · Answer 3 · edited Mar 06 '11 at 02:33

2

You can use the unicode character, just to be sure.

\u00A3

Watch out for the ereg functions and unicode support.

http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/characters.html

Long live the Queen.

edited Mar 06 '11 at 02:33

Brad Christie

100,477
16
156
200

answered Mar 05 '11 at 23:36

Brandon Frohbieter

17,563
3
40
62

Alan Moore · Answer 4 · 2011-03-07T21:52:13.887

As @Chris pointed out, you can use paired bracket characters as delimiters, but they have to properly balanced throughout the regex. For example, '<<>' won't work, but '<<>>' will. You can use any of (), [], {} or <>, but I recommend the braces or the square brackets; parentheses are too common in regexes, and angle brackets are used in escape sequences like (?>...) (atomic group) and (?<=...) (lookbehind).

But I'm with @Brad on this one: why not just escape the delimiter character with a backslash whenever it appears in the regex?

score 1 · Answer 5 · answered Mar 05 '11 at 23:33

1

You would know the data being parsed better than we would. As far as regex is concerned, it's no different than any other ASCII value.

Though I have to ask: what's wrong with traditional then just escaping it? Or using a class with a character range?

answered Mar 05 '11 at 23:33

Brad Christie

100,477
16
156
200

I had escaping backwards originally (was trying to escape the delimiter instead of the occurrence of the delimiter in the expression... lol), but then i was more curious ove whether it would be a bad idea to use a character like that. – Damon Mar 06 '11 at 02:32

Is it ok to use £ as delimiter in preg_replace?

5 Answers5