identify if oracle column data contains any unicode char beyond the range U+0000 and U+FFFF

Question

I am trying to migrate oracle table to mysql but I am running into some issues on target side due to which I can only use utf-8 i.e 3 Bytes.

I would like to run sql query to identify if any record has a unicode char beyond 3 byte range i.e U+0000 and U+FFFF

in another way. How to identify table rows having unicode char between code point value U+10000 and U+10FFFF ?

Characters in UTF-8 can be 1 to 4 Bytes, not 3. I think the better option is to use `utf8mb4` instead of `utf8`, see https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql — Wernfried Domscheit, Sep 12 '18 at 06:10
oracle to mysql migration doesn't support utf8mb4 in aws. Its a limitation of DMS. hence exploring with utf8mb3 option — user3631935, Sep 12 '18 at 11:28

score 1 · Answer 1 · answered Sep 20 '18 at 19:14

MySQL's CHARACTER SET utf8 is the same as utf8mb3. The outside world's UTF-8 corresponds to MySQL's utf8mb4. Try to use utf8mb4.

If you must locate 4-byte UTF-8 characters coming in, convert to HEX and search for the F0 character. Since we don't know what your data looks like, nor what tools you might have available, I cannot be more specific.

identify if oracle column data contains any unicode char beyond the range U+0000 and U+FFFF

1 Answers1