1

I want to use regular expression to filter out pure Chinese name by this:

SELECT `name` FROM `table`  WHERE `name` REGEXP '[u4e00-u9fa5]';

But, according to this, it isn't possible,so I want to approach that from opposite direction,find content without any letter,digital and special character(I knew that it is not rigid),but can't find "and" operator,so how to do it?

Community
  • 1
  • 1
Alex Luya
  • 9,412
  • 15
  • 59
  • 91
  • Sorry, "words" without letters are impossible, right? Also, If there should be no word, nor special chars, do you need to only match whitespace or empty? Too unclear for me. – Wiktor Stribiżew Jul 29 '16 at 12:14
  • You can make a negative character class, like `[^A-Za-z0-9./,]`, and keep adding to it as necessary, but I'm not clear if that's what you're asking. – Steven Doggart Jul 29 '16 at 12:19
  • @WiktorStribiżew Sorry,I mean Chinese character or word. – Alex Luya Jul 29 '16 at 12:22
  • @a_horse_with_no_name I am using MariaDB – Alex Luya Jul 29 '16 at 12:23
  • @AlexLuya: MariaDB uses PCRE, right? See [*Starting with MariaDB 10.0.5, MariaDB switched to the PCRE regular expression library for enhanced regular expressions.*](https://mariadb.com/kb/en/mariadb/pcre/) Try `REGEXP '[\\x{4e00}-\\x{9fa5}]'` or even `REGEXP '\\p{Han}'`. – Wiktor Stribiżew Jul 29 '16 at 12:26
  • @WiktorStribiżew,you suggestion can't handle combinations like:"A张方" and I want to get this kind of name out also. – Alex Luya Jul 29 '16 at 12:37
  • @AlexLuya: [`[\\x{4e00}-\\x{9fa5}]` finds a match in that string](https://regex101.com/r/jP8bO9/1). [So does `\p{Han}`](https://regex101.com/r/jP8bO9/2) – Wiktor Stribiżew Jul 29 '16 at 12:42
  • See also http://stackoverflow.com/questions/38656686/why-does-mariadb-regex-give-contrary-result – Rick James Jul 30 '16 at 15:52

1 Answers1

2

MariaDB uses PCRE regex library beginning with 10.0.5 version: "Starting with MariaDB 10.0.5, MariaDB switched to the PCRE regular expression library for enhanced regular expressions.".

To match entries that contain Chinese letters use

REGEXP '[\\x{4e00}-\\x{9fa5}]'

or even

REGEXP '\\p{Han}'

To match the reverse, entries with no Chinese letters, use:

REGEXP '^[^\\x{4e00}-\\x{9fa5}]*$'

or

REGEXP '^\\P{Han}*$'
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Stribizew, SELECT `name` FROM `table` WHERE `name` REGEXP '^\\p{Han}*$',won't match entries with no Chinese letters,but pure Chinese word. – Alex Luya Jul 30 '16 at 01:57
  • Yes, I know. `'^\\p{Han}*$'` matches a string that only consists of Chinese letters. And `'^\\P{Han}*$'` will match a string that consists of characters other than Chinese letters. Note the uppercase `P` in the latter regex that means we are reversing the Unicode property class. – Wiktor Stribiżew Jul 30 '16 at 06:25