I am querying json-formatted data using apache drill on windows 10 from a dos-prompt. I am following their guide.
I have the very basic json-object {"år":"2018", "æøå":"ÆØÅ"}
and when I query it from apache drill the output is not displayed correctly.
select * from dfs.`C:\Users\foo\Downloads\utf8.json`;
+-------+------+
| Õr | µ°Õ |
+-------+------+
| 2018 | ãÏ┼ |
+-------+------+
1 row selected (0,114 seconds)
The file is saved in UTF-8 format (using sublime text). I have also tried to save it in UTF-8 with BOM but it did not make a difference.
Setting the environment variable as mentioned in this SO-thread using
set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF8
does not help.
EDIT:
Slightly after posting I found a SO-thread that suggested to change the windows codepage to 65001 (utf-8). This shows the correct letters but also prevents the command-history (arrow-up) from working properly.
chcp 65001
sqlline.bat -u "jdbc:drill:zk=local"
select * from dfs.`C:\Users\cgu\Downloads\utf8.json`;
+-------+------+
| år | æøå |
+-------+------+
| 2018 | ÆØÅ |
+-------+------+