If you are facing the same problem, and your character set is covered by the ANSI test encoding (codepage 1252 or "ISO 8859-1"), you could use that encoding instead to temporarily circumvent the problem with UTF-8, however UTF-8 is the modern standard that encompasses every script for ultimate localisation.
I'm creating an application that has to read user input containing accented characters from the console. From what I've read online, modern consoles are capable of handling accented character outputs, and correctly encoding inputs, even though they show up as ?
before sending the command.
PS C:\> echo ?
ü
Ps C:\>
Note: this behaviour is not reproducible in Command Prompt. Command Prompt, when run in Windows Terminal, seems to display accented characters correctly before sending as well.
However, when running the following test code:
package com.test.outputtest;
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.*;
import java.nio.file.*;
public class OutputTest {
public static void main(String[] args) {
// Set I/O to use UTF-8
System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, StandardCharsets.UTF_8));
// Create the response listener
Scanner input = new Scanner(System.in, StandardCharsets.UTF_8);
System.out.println(Arrays.toString("èéëê".getBytes(StandardCharsets.UTF_8)));
String temp = input.nextLine();
System.out.println(Arrays.toString(temp.getBytes(StandardCharsets.UTF_8)));
}
}
this is the output (after building the artifact "app.jar"):
PS C:\Users\[name]\Desktop\output_test> chcp 65001
Active code page: 65001
PS C:\Users\[name]\Desktop\output_test> java "-Dfile.encoding=UTF-8" -jar app.jar
[-61, -88, -61, -87, -61, -85, -61, -86]
èéëê
[0, 0, 0, 0]
The first array of bytes comes from the pre-written string, the second array is the bytes of the inputted string. The fact that echo
outputs accents correctly leads me to believe that this is a compiler error, but I'm not sure how to fix it. I've tried replacing the Scanner
with Console
, that gave me the same error.
When running inside of IntelliJ, the ü is read completely normally when inputting it in the terminal. This is also a reason why I suspect a problem during compilation. When running with command prompt instead of PowerShell, the same error occurs.
Note: I'm using Windows Terminal running PowerShell and using IntelliJ Idea Community Edition 2021.3. I have not edited the .xml
files besides the artifact building file path and some other project-specific file paths.
- OS: Windows 10 build 19045.2728
- Java version: 17.0.6 (Also in IntelliJ)
- Default codepage: 850 (OEM)
- Codepage used in which the error occured: 65001 (UTF-8)