0

I am trying to load a table in Hive which has accented letters. I was initially using openCSVSerde to parse the CSV file and load it to the table. However, when I come to the accented letters, it prints a � in place of the accented letters. I have tried several methods but it is not working.

  1. Using openCSVserde, I tried declaring serialization.encoding = 'windows-1252' inside the TBLPROPERTIES section, it did not resolve the issue. I did check again by adding serialization.encoding = 'windows-1252' inside WITH SERDEPROPERTIES. That also did not work.
  2. Using openCSVserde, I tried explicitly declaring serialization.encoding = 'utf-8' inside the WITH SERDEPROPERTIES, it did not work.
  3. Using openCSVserde, I tried declaring serialization.encoding = 'ISO-8859-1' inside the WITH SERDEPROPERTIES, it did not work.

I, then, switched over to LazySimpleSerde as I read somewhere, that it is compatible with accented characters. I tried setting serialization.encoding = 'windows-1252' inside WITH SERDEPROPERTIES which worked, but it brought a new error. Some of the text columns had quotes which split the data and loaded it incorrectly into the table.

  1. So now, I tried using 'quote.delim'='"' inside WITH SERDEPROPERTIES which did not fix the incorrect data-split.
  2. I tried using 'quoteChar'='"' inside WITH SERDEPROPERTIES which ,also, did not fix the incorrect data-split.

I reverted to openCSVSerde and tried using serialization.encoding = 'ISO-8859-1' inside WITH SERDEPROPERTIES as well as store.charset = 'ISO-8859-1',retrieve.charset = 'ISO-8859-1' inside TBLPROPERTIES, which solved the incorrect data-split but brought me back to not being able to print the accented characters. I also tried serialization.encoding = 'utf-16' inside WITH SERDEPROPERTIES, which, unsurprisingly, did not resolve the issue.

Can anyone tell how I can use openCSVSerde to print the accented letters?

dtolnay
  • 9,621
  • 5
  • 41
  • 62
thecuriouscat
  • 59
  • 2
  • 8

0 Answers0