1

I have data in a Snowflake table column that looks like this...

These are actually Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f 

That is the complete value of one column

The UTF-8 decoded value of those characters above is なぜ日銀だけ違うのか?

Can someone suggest an update SQL statement that would change the value of the column to

These are actually Japanese characters なぜ日銀だけ違うのか?

Better still, is there an update statement that will decode any UTF-8 bytes to convert them into the desired characters

JRomeo
  • 543
  • 1
  • 4
  • 20

1 Answers1

2

A quick JS UDF solves the problem:

create or replace function decode_utf8(S string)
returns string
language javascript
as
$$
return decodeURIComponent(escape(S));
$$;

In action:

select decode_utf8('These are actually Japanese characters \xe3\x81\xaa\xe3\x81\x9c\xe6\x97\xa5\xe9\x8a\x80\xe3\x81\xa0\xe3\x81\x91\xe9\x81\x95\xe3\x81\x86\xe3\x81\xae\xe3\x81\x8b\xef\xbc\x9f ')

Screenshot as proof:

enter image description here

(based on https://stackoverflow.com/a/13691499/132438)

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
  • This is great, and it works just as you show here. Thank you. I see however, that it works for string literals only. If I provide a column name, it seems to have no effect. For example SELECT TBL_COL, DECODE_UTF8(TBL_COL) AS TBL_COL_NEW FROM TBL, then TBL_COL _NEW has the same, unchanged value as TBL_COL in the result set. Is there a way to modify it to get the same effect? – JRomeo Jun 03 '22 at 16:18
  • Depends how the columns are encoded. Can you provide a new question that starts with INSERT statements, so I can reproduce your problem? – Felipe Hoffa Jun 03 '22 at 21:57
  • Yes it works only for string literals not for columns – Rishi Mukunthan Apr 25 '23 at 10:33