Read file as the same Byte Array across operating systems

Question

I have a Spring Boot app that runs from a WAR file, and I test it in different Operating Systems. It reads a file from the "resources" directory (this is the path within the WAR: "/WEB-INF/classes/file-name") by performing the following steps:

1. Get the InputStream:

InputStream fileAsInputStream = new ClassPathResource("file-name").getInputStream();

2. Convert the InputStream to ByteArray:

byte[] fileAsByteArray = IOUtils.toByteArray(fileAsInputStream );

The issue is that the content of the obtained Byte Array is different between Operating Systems, causing inconsistencies in further operations. The reason for this is that the file contains newline characters ("\n" = 0x0A = 10 in Linux, and "\r\n" = 0x0A 0x0D = 10 13 in Windows). The differences across Operating Systems are explained for example in this thread: What are the differences between char literals '\n' and '\r' in Java?

This is an example where the Byte Arrays have different contents:

When app runs on Linux: [114, 115, 97, 10, 69, 110] => 10 is the "\n"
When app runs on Windows: [114, 115, 97, 13, 10, 69, 110] => 13 10 is the "\r\n"

So between these two OSs, the 10 is the same with 13 10. The file is always the same (because it is a Private Key used for establishing the SFTP communication). Differs only the Operating System from which the code is being run.

Is there a solution to obtain the same Byte Array, regardless of the Operating System from which the code is being run?

One working workaround would be to map the newline character to the same byte. So to iterate through the array, and to replace each 10 (for example) with 13 10.

Another solution that was tried is using StandardCharsets.UTF_8, but the elements of the arrays are still different:

IOUtils.toByteArray(new BufferedReader(new InputStreamReader(new ClassPathResource("file-name").getInputStream())), StandardCharsets.UTF_8)

Are you testing with the exact same war file in both cases, or are you *creating* the war file on each system before the test? — Jon Skeet, Jan 06 '23 at 18:19
The WAR file is generated via Maven, through `mvn install`, from the exact same source code, when the app is being deployed. The values from the example were extracted at runtime, through remote debugging. — Pop Alexandru, Jan 06 '23 at 18:24
Okay, I'm suspicious of that then... if you can generate the war file once and then deploy *that exact file* on multiple systems, then you remove the potential for "exact same source code" to actually mean "modulo what a git clone might do to line endings" for example. — Jon Skeet, Jan 06 '23 at 18:27
The WAR file is generated on each cloud instance (*each OS*) during deployment, resulting a WAR file for each OS. But the WAR file is generated based on the same source code. I mean, the `mvn install` command is executed on the same source code, and produces a WAR file for each OS where that command is executed. — Pop Alexandru, Jan 06 '23 at 18:43
Are you saying you have *absolutely no way* of taking the same physical war file and deploying it on multiple systems? It doesn't matter whether that's how you *normally* deploy or not - this is for diagnostic purposes. Because I strongly suspect that if you do that, you'll see the same bytes on all OSes - but you should test it. — Jon Skeet, Jan 06 '23 at 18:47
If you want to implement a solution that would work with both flavors of the WAR file you'd probably need to use or implement a custom "FilterReader" (maybe based/extending a LineReader) that would either replace/fix the character or read "lines" and just skip the line separator (this is just for the case you need to continue using different WAR files) — Ale Zalazar, Jan 06 '23 at 19:05
*byte[] fileAsByteArray = IOUtils...* is a black box to us. Why not `InputStream.readAllBytes`? Varying line separators have no bearing on a binary data read — g00se, Jan 07 '23 at 11:31
@JonSkeet thank you for the suggestion, but we cannot deploy our app like that. But it could be a valid solution for other people. — Pop Alexandru, Jan 18 '23 at 14:56
@AleZalazar thanks for the suggestion. This is how we will proceed. — Pop Alexandru, Jan 18 '23 at 14:57
@g00se thanks, but we use Java 8 and that method needs Java 9 — Pop Alexandru, Jan 18 '23 at 14:59
It wasn't intended to be a solution. The point of the suggestion wasn't that that would fix the problem, and I wasn't suggesting that you do that in production - the point was *purely for diagnostics*. — Jon Skeet, Jan 18 '23 at 14:59
OK fair enough but the point is that there should be no difference between the binary form between OSs, *unless* they are *starting out different*. Plus, for most normal purposes, i.e. text file processing, differences in line separators shouldn't really be causing a problem — g00se, Jan 18 '23 at 15:03

Read file as the same Byte Array across operating systems

0 Answers0