1

Why is this preg_replace not working?

FYI, I have the PHP script set to UTF8 Without BOM and I have the function here set to remove all matches of the pattern (instead of what I will actually do, which is remove all non-matches) because that is easier for testing. Note also that the character is not in my regex, so this should be the only character left behind.

$string='The Story of Jewād';
echo preg_replace('@([!"#$&’\(\)\*\+,\-\./0123456789:;<=>\?ABCDEFGHIJKLMNOPQRSTUVWXYZ\[\\\]\^_‘abcdefghijklmnopqrstuvwxyz\{\|\}~¡¢£⁄¥ƒ§¤“«‹›fifl–†‡·¶•‚„”»…‰¿`´ˆ˜¯˘˙¨˚¸˝˛ˇ—ÆªŁØŒºæıłøœß÷¾¼¹×®Þ¦Ð½−çð±Çþ©¬²³™°µ ÁÂÄÀÅÃÉÊËÈÍÎÏÌÑÓÔÖÒÕŠÚÛÜÙÝŸŽáâäàåãéêëèíîïìñóôöòõšúûüùýÿž€\'])@u','',$string);

The result I get is $string unchanged. Why would this be?

Alasdair
  • 13,348
  • 18
  • 82
  • 138
  • 1
    Try with `\pL+` instead of relisting accentuated letters individually. – mario Mar 16 '13 at 15:54
  • 1
    might it not be easier to do a regex that matches the characters you do want to allow, rather than listing all those non-allowed characters. Also, for digits, you can use `\d` and for contiguous ranges, you can use things like `A-Z`. That will make the expression shorter and easier to manage. – Spudley Mar 16 '13 at 15:56
  • @Spudley, yes that is what I am doing. The above example is inversed for easy testing. – Alasdair Mar 16 '13 at 16:09
  • @mario, I can't use `\pL+` because this list is specific. It is all the characters I can use in a specific font I am using. – Alasdair Mar 16 '13 at 16:09

1 Answers1

3

This works as reverse:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
<?php 

$string='The Story of Jewād';
echo preg_replace('@([ā])@','',$string);

?>

So, there is just a syntax problem somewhere ... This isn't a good idea to list all characters as a RegExp. You can do listings something like this:

ltrChars : 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF';
rtlChars : '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC';
Mostafa Shahverdy
  • 2,687
  • 2
  • 30
  • 51
  • I need to list all the characters out specifically because these are all the characters I have in a font. – Alasdair Mar 16 '13 at 16:10
  • Well, at least I can see some ranges out there; like A-Z or 0-9 – Mostafa Shahverdy Mar 16 '13 at 16:11
  • 1
    Your method here did not work exactly, but with a small change it did: `@([^\x{0020}-\x{007E}\x{FB01}\x{FB02}\x{00A1}-\x{00AC}\x{00AE}-\x{00FF}\x{0160}\x{0161}\x{0192}\x{2013}\x{2018}-\x{201A}\x{2020}-\x{2022}\x{2026}\x{2030}\x{2039}\x{2044}\x{201C}-\x{201E}\x{203A}\x{02C6}\x{02D8}-\x{02DD}\x{02C7}\x{2014}\x{0141}\x{0142}\x{0131}\x{0152}\x{0153}\x{2212}\x{2122}\x{0178}\x{017D}\x{017E}\x{20AC}])@u` – Alasdair Mar 17 '13 at 06:11