I am trying to load a table in Hive which has accented letters. I was initially using openCSVSerde to parse the CSV file and load it to the table. However, when I come to the accented letters, it prints a � in place of the accented letters. I have tried several methods but it is not working.
- Using openCSVserde, I tried declaring
serialization.encoding = 'windows-1252'
inside theTBLPROPERTIES
section, it did not resolve the issue. I did check again by addingserialization.encoding = 'windows-1252'
insideWITH SERDEPROPERTIES
. That also did not work. - Using openCSVserde, I tried explicitly declaring
serialization.encoding = 'utf-8'
inside theWITH SERDEPROPERTIES
, it did not work. - Using openCSVserde, I tried declaring
serialization.encoding = 'ISO-8859-1'
inside theWITH SERDEPROPERTIES
, it did not work.
I, then, switched over to LazySimpleSerde as I read somewhere, that it is compatible with accented characters. I tried setting serialization.encoding = 'windows-1252'
inside WITH SERDEPROPERTIES
which worked, but it brought a new error. Some of the text columns had quotes which split the data and loaded it incorrectly into the table.
- So now, I tried using
'quote.delim'='"'
insideWITH SERDEPROPERTIES
which did not fix the incorrect data-split. - I tried using
'quoteChar'='"'
insideWITH SERDEPROPERTIES
which ,also, did not fix the incorrect data-split.
I reverted to openCSVSerde and tried using serialization.encoding = 'ISO-8859-1'
inside WITH SERDEPROPERTIES
as well as
store.charset = 'ISO-8859-1',retrieve.charset = 'ISO-8859-1'
inside TBLPROPERTIES
, which solved the incorrect data-split but brought me back to not being able to print the accented characters. I also tried serialization.encoding = 'utf-16'
inside WITH SERDEPROPERTIES
, which, unsurprisingly, did not resolve the issue.
Can anyone tell how I can use openCSVSerde to print the accented letters?