0

I tried the suggestions mentioned (here) to check for the rows with non-utf8 characters in a column. I compared this by comparing the LENGTH and LENGTHB values of the column. However, there are around thousands of rows as such.

For one such row, the value in the column was alphanumeric with only one special character i.e. the pound symbol -> £ . (There are other records as well with the bullet listing symbol found in MS Word which come as  in Notepad++)

I used function RAWTOHEX on the column and the output did have C2A3 which is the representation of £ symbol. I'm confused as to how do I prove that it is a non-utf8 character, and on second thoughts is the method of comparing the length of the column the right way to find out?

Note:

SELECT * FROM NLS_DATABASE_PARAMETERS;
----------------------------------------
NLS_NCHAR_CHARACTERSET  AL16UTF16
NLS_CHARACTERSET    AL32UTF8

PS: I am using SQL developer, but I am not an oracle developer.

djgcp
  • 163
  • 1
  • 14
  • 1
    What is your problem and what do you mean by "invalid UTF-8 character"? – Wernfried Domscheit Oct 06 '20 at 18:14
  • You correctly pointed out that `C2A3` represents the `£` in UTF-8 - so what on earth do you mean by "that it is a non-utf8 character"? – Wernfried Domscheit Oct 06 '20 at 18:21
  • Perhaps you mean non-ASCII characters? – user9601310 Oct 07 '20 at 04:36
  • @WernfriedDomscheit We are using a third party tools to capture CDC data, and the tool generates an error if the data has non-UTF8 characters. Hence we need to identify those if present in the table. Do you know of way way how to identify them? – djgcp Oct 07 '20 at 10:42
  • 2
    Again, what do you mean by "non-UTF8 characters"? – Wernfried Domscheit Oct 07 '20 at 11:50
  • I've got the same question. There are articles on invalid utf-8 data here https://stackoverflow.com/questions/1319022/really-good-bad-utf-8-example-test-data and here https://stackoverflow.com/questions/1301402/example-invalid-utf8-string – Gerrit Griebel Feb 23 '21 at 08:07

0 Answers0