0

I have input search use preg_replace, however I wish my search input accept others language

Keep - Chinese, Japaneses, German.. etc.

Remove - symbol character like @#$%^*() those

This one only keep english preg_replace("/[^a-zA-Z0-9]+/", "", $search);

any way to set up for multi language?

user2178521
  • 833
  • 1
  • 14
  • 26
  • So, what *do* you want to keep and what do you want to remove? May it be easier to use a blacklist rather than a whitelist? – deceze Jun 12 '13 at 07:38
  • To regex/php, `æ` is as much of a unicode character as `^`. Your best bet it to make a blacklist, and simply replace with that. (`preg_replace("/[!\"#¤%&\(\)=/]+", "", $search);`). – h2ooooooo Jun 12 '13 at 07:39

2 Answers2

3

Though for java, a concise overview here.

You can use the so called Posix notation:

[^\p{Alnum}\p{M}]

The first is the alphanumeric group, and the second the combining diacritical marks: the accents. The latter should not be forgotten because one can write ĉ as one Unicode point c-circumflex, but also as 'c' followed by a combining circumflex ^ (zero width, here represented by the normal circumflex). In some languages there are more than one marks to a base letter.


Correction:

[^\p{L}\p{N}\p{M}]
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

Here's the PHP unicode regex reference. The plus + isn't necessary as PHP will loop through the string. The \s will match all whitespace characters.

preg_replace("![^\p{L}\p{N}\s]!", "", $search);

If you want to match only the space character itself you would add it in the brackets as a literal:

preg_replace("![^\p{L}\p{N} ]!", "", $search);

Update Added bit about spaces per comment request

AbsoluteƵERØ
  • 7,816
  • 2
  • 24
  • 35