1

I am trying to migrate oracle table to mysql but I am running into some issues on target side due to which I can only use utf-8 i.e 3 Bytes.

I would like to run sql query to identify if any record has a unicode char beyond 3 byte range i.e U+0000 and U+FFFF

in another way. How to identify table rows having unicode char between code point value U+10000 and U+10FFFF ?

  • 2
    Characters in UTF-8 can be 1 to 4 Bytes, not 3. I think the better option is to use `utf8mb4` instead of `utf8`, see https://stackoverflow.com/questions/30074492/what-is-the-difference-between-utf8mb4-and-utf8-charsets-in-mysql – Wernfried Domscheit Sep 12 '18 at 06:10
  • 1
    oracle to mysql migration doesn't support utf8mb4 in aws. Its a limitation of DMS. hence exploring with utf8mb3 option – user3631935 Sep 12 '18 at 11:28

1 Answers1

1

MySQL's CHARACTER SET utf8 is the same as utf8mb3. The outside world's UTF-8 corresponds to MySQL's utf8mb4. Try to use utf8mb4.

If you must locate 4-byte UTF-8 characters coming in, convert to HEX and search for the F0 character. Since we don't know what your data looks like, nor what tools you might have available, I cannot be more specific.

Rick James
  • 135,179
  • 13
  • 127
  • 222