0

I tried:

select * from table1 where HEX(col1) NOT REGEXP '^(..)*(E[4-9]))'

but I am still getting col1 with Chinese characters like this full entry '[动物园hello'. I need to select col1 entries that do not contain any single Chinese character like this exact full entry 'abcd'.

Edville
  • 345
  • 1
  • 5
  • 18
  • Provide some sample data (2-3 `HEX(col1)` values with and without Chinese characters). – Akina Nov 19 '20 at 06:24
  • Too low examples. Provide at least 2-3 wrong rows as an output of `select col1, hex(col1) from table1`, please. – Akina Nov 19 '20 at 06:46
  • This might help: https://stackoverflow.com/questions/1366068/whats-the-complete-range-for-chinese-characters-in-unicode – kmoser Nov 19 '20 at 06:50
  • Think about backward task - select only rows which matches valid pattern, like `where HEX(col1) REGEXP '^(\ -\~)*'$`... – Akina Nov 19 '20 at 06:52

1 Answers1

0

I got two way to do this but neither of those is pinpoint accurate. but still worth a shot.

WHERE length(col) = char_length(col)

but i think this only work with UTF-8.

another solution:

WHERE col not REGEXP '[\u4e00-\u9fa5]'

this should work too but depend on the extension and other chinese char unicode might need to change those range. I believe @kmoser comment have some really detail info about this problem. but first solution is kinda useful most of the time.

T. Peter
  • 887
  • 4
  • 13