I have a file in ISO-8859-1 containing german umlauts and I need to unmarshall it using JAXB. But before I need the content in UTF-8.
@Override
public List<Usage> convert(InputStream input) {
try {
InputStream inputWithNamespace = addNamespaceIfMissing(input);
inputWithNamespace = convertFileToUtf(inputWithNamespace);
ORDR order = xmlUnmarshaller.unmarshall(inputWithNamespace, ORDR.class);
...
I get the "file" as an InputStream. My idea was to read the file's content in UTF-8 and make another InputStream to use. This is what I've tried:
private InputStream convertFileToUtf(InputStream inputStream) throws IOException {
byte[] bytesInIso = ByteStreams.toByteArray(inputStream);
String stringIso = new String(bytesInIso);
byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String stringUtf = new String(bytesInUtf);
return new ByteArrayInputStream(bytesInUtf);
}
I have those 2 Strings to check the contents, but even just reading the ISO file, it gives question marks where umlauts are (?) and converting that to UTF_8 gives strange characters like 1/2 and so on.
UPDATE
byte[] bytesInIso = ByteStreams.toByteArray(inputWithNamespace);
String contentInIso = new String(bytesInIso);
byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String contentInUtf = new String(bytesInUtf);
Verifying contentInIso prints question marks instead of the umlauts and by checking contentInIso instead of umlauts, it has characters like "�".
@Override
public List<Usage> convert(InputStream input) {
try {
InputStream inputWithNamespace = addNamespaceIfMissing(input);
byte[] bytesInIso = ByteStreams.toByteArray(inputWithNamespace);
String contentInIso = new String(bytesInIso);
byte[] bytesInUtf = new String(bytesInIso, ISO_8859_1).getBytes(UTF_8);
String contentInUtf = new String(bytesInUtf);
ORDR order = xmlUnmarshaller.unmarshall(inputWithNamespace, ORDR.class);
This method convert it's called by another one called processUsageFile:
private void processUsageFile(File usageFile) {
try (FileInputStream fileInputStream = new FileInputStream(usageFile)) {
usageImporterService.importUsages(usageFile.getName(), fileInputStream, getUsageTypeValidated(usageFile.getName()));
log.info("Usage file {} imported successfully. Moving to archive directory", usageFile.getName());
If i take the code I have written under the UPDATE statement and put it immediately after the try, the first contentInIso has question marks but the contentInUtf has the umlauts. Then, by going into the convert, jabx throws an exception that the file has a premature end of line.