-1

I've prepared docker image that demonstrate the problem:

https://drive.google.com/uc?id=1i04_dVL0Rp5rxXCMuHaS4LYREkZjAAW1&export=download

This image is basically alpine:3.11 + apk add openjdk8 maven + my maven project containing sample minimal java class that shows problem

Which you can try with following commands:

# docker load -i bugexample.img
# docker run -w /root/bug-example --name bugtest bugexample /bin/ash build-until-sha-different.sh

If you are lucky enough (sometimes require several attempts) you will get following output:

Found! Sha1 of two subsequent otherwise identical builds are different!
--- 1.sha1
+++ 2.sha1
@@ -1,3 +1,3 @@
 d8d46555c93da579adefc629f1764965a5493edb  com/SimpleBug$1.class
 75007242aab1e1877d24124d432cb246a79476a8  com/SimpleBug$SimpleBugBuilder.class
-23e8d0ea909b95a7955e0ec0adb4d12ae2193dd1  com/SimpleBug.class
+6a303d69d3f382b23ca04caee4102ee1cd7151e3  com/SimpleBug.class

The core issue with this build is that it produce different bytecode almost each time it get build even though nothing else (neither environment, nor code itself) had changed.

When I do compare these different class files I see that they differ in one single byte:

# cmp -lb 1_SimpleBug.class 2_SimpleBug.class
4053  66 6     65 5

Digging deeper into class file structure I've found that this difference come from stackmapframe constant pool pointers (StackMapTable attribute -> stack_map_frame entry with tag Object_variable_info -> cpool_index)

1_SimpleBug.class
   #35 = Utf8               supSetStringParameter10
   #36 = Utf8               Lcom/google/common/base/Supplier;

2_SimpleBug.class
   #35 = Utf8               supSetStringParameter10
   #36 = Utf8               Lcom/google/common/base/Supplier;

So one file points to #35 and another to #36. I don't think this is correct behavior.

I would like to sumbit this to a proper issue tracker but I don't know how to do that since all related JDK trackers are for devs only.

# java -version
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (IcedTea 3.15.0) (Alpine 8.242.08-r0)
OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)

# mvn -version
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /usr/share/java/maven-3
Java version: 1.8.0_242, vendor: IcedTea, runtime: /usr/lib/jvm/java-1.8-openjdk/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-1050-kvm", arch: "amd64", family: "unix"

And here is archive with java project:

https://drive.google.com/uc?id=1ZBdRzUk00QtpkGGnKAzipMjMtcnkKGN4&export=download

AngryGami
  • 25
  • 3
  • 7
    Nothing in the specification says that two subsequent builds have to produce identical bytecode. – Holger Feb 13 '20 at 13:34
  • 2
    This is not the way to ask a good question. Put a minimal reproducible example in the Question. (Think: why would we trust what you put in that docker image?) – Stephen C Feb 13 '20 at 13:44
  • 2
    Duplicate of https://stackoverflow.com/questions/14984984/is-the-creation-of-java-class-files-deterministic – Stephen C Feb 13 '20 at 13:48
  • 2
    If you really, want to submit a bug report use https://bugreport.java.com/bugreport/ but you will be wasting your time. This is not a bug. This is just how the Sun / Oracle / OpenJDK java compiler has always worked. – Stephen C Feb 13 '20 at 13:51
  • 1
    Also duplicate of https://stackoverflow.com/questions/12672232/identical-java-sources-compile-to-binary-differing-classes – Stephen C Feb 13 '20 at 13:56
  • @StephenC - I understand that you might not want to trust my image, but how else I can provide 100% reproducible environment? You may look inside image without executing anything with docker tooling. I was thinking that – AngryGami Feb 13 '20 at 14:24
  • 1
    You provide the source code and build file, and tell your OS and exact Java version / release. – Stephen C Feb 13 '20 at 14:29
  • @Holger Yep, I know that. Though of >10000 java files in our project, only one have demonstrated this behavior. I assume that javac developers actually worked hard to make compilation deterministic even though spec explicitly didn't require that. If this is not the case then entire https://reproducible-builds.org/ initiative is lost cause for java. Maven devs saying that their latest version is reproducible (https://maven.apache.org/docs/3.6.3/release-notes.html) are lying? – AngryGami Feb 13 '20 at 14:40
  • @StephenC I've seen both these questions that you mentioned as duplicates but neither of these talk about exact same source/environment repeatable builds, both accepted answers diverge into different compiler flags and whatnot. In this example everything is exactly the same for each build. Though thanks anyway. – AngryGami Feb 13 '20 at 14:40
  • 2
    @AngryGami the explanation of reproducible-builds.org indicates that this is indeed a waste of time. For any nontrivial software, vulnerabilities or backdoors can be hidden in plain sight in the source code, instead of injecting them during the build. If developers are even lazy enough not to compile the source code themselves, they surely did not inspect the source code for any threats. – Holger Feb 13 '20 at 14:50
  • 3
    [this answer](https://stackoverflow.com/a/41976196/2711488) addresses the fact that even the same version may produce different output under certain circumstances and, considering the premise of reproducible-builds.org, also mentions the point that even identical class files won’t produce identical jar files, as these do contain timestamps. And in Java, a build usually ends with a jar file or a binary with an embedded jar file… – Holger Feb 13 '20 at 14:56
  • @Holger :) I wouldn't be so pessimistic. Though, even in theory, compiler is just pure function source -> bytecode. There absolutely no reason it should not be. Everything that is non-deterministic in this process could be avoided. – AngryGami Feb 13 '20 at 14:56
  • @Holger problems with jar files is completely different topic and actually already kinda solved for maven build – AngryGami Feb 13 '20 at 15:05
  • 2
    Anyway, [this comment](https://stackoverflow.com/questions/60207897/probable-bug-in-java-compiler-for-openjdk8-how-to-report-and-get-it-fixed#comment106496657_60207897) did already show where you can file a bug report. We’ll see how the `javac` developers will respond to the report. – Holger Feb 13 '20 at 15:35
  • Forget about that, it's not deterministic. I've noticed this myself more than once and I vaguely remember even seeing an issue on that in the OpenJDK bugtracker where it was explained why that isn't the case (can't remember, unfortunately). – Anlon Burke Feb 13 '20 at 19:01
  • 1
    A couple of reasons why `javac` may not be deterministic: 1) the order in which files are returned by the OS in a directory traversal may not be deterministic. 2) objects that use identityHashcode as their hash value will have non-deterministic hashes which is liable to affect the iteration order of hashmaps. These could (plausibly) lead to insignificant differences in `.class` files. (Fixes for both of these and similar would probably make the compiler slower.) – Stephen C Feb 14 '20 at 03:17
  • @StephenC 1) How order of files returned from OS could affect bytecode generation of a particular file? 2) I would argue that if this is the case this problem would appear much much much more often. Difference that this issue demonstrate is actually maybe even a real bug because as you can see javap output says that both files have identical constant pools while one file points to #35 and another to #36 entry which even have different values. Though, well, they may be pointers for different attributes but why then only one byte is different? There must be another difference elsewhere. – AngryGami Feb 14 '20 at 07:37
  • 1
    See my updated answer in https://stackoverflow.com/questions/12672232/identical-java-sources-compile-to-binary-differing-classes. But I don't know why you want to argue about this. Submit the bug report and be done with it. We have told you where to submit it. (There is nothing >>we<< can do to fix this for you, and it is not up to us to decide if this is a bug or not.) – Stephen C Feb 14 '20 at 07:40
  • I just disagree with some of your agruments. There is no evidence javac compiler put any timestamps or source file full path into class files (i.e. not from package start directory). Compile time constants is just change in dependencies in disguise which expected to produce different bytecode as well as changes in signatures of dependencies. Order of imported files shouldn't matter because we are referring to names and only thing that matters is order of import statements. So we left with identityHashcode randomness which doesn't look to be happening often for some reason. – AngryGami Feb 14 '20 at 08:31
  • 2
    @AnlonBurke that’s really unfortunate, as a record containing an explicit statement would be exactly what is needed here… – Holger Feb 14 '20 at 09:56
  • 1
    @StephenC dependencies on the directory traversal order can be solved, see [JDK-7003006](https://bugs.openjdk.java.net/browse/JDK-7003006). – Holger Feb 14 '20 at 09:58
  • 1
    @AngryGami - *"I just disagree with some of your agruments."* - OK. However, it is immaterial whether you agree with me or not. It is not my opinion that matters. – Stephen C Feb 14 '20 at 10:30

1 Answers1

0

Well... The Mountain in Labour...

All of this was for nothing. Maven was main culprit. First time I ran my build it downloads lots of stuff that maven require (apart from my project dependencies) and then start compilation in same jvm. Second time build runs there is no need to download so compilation starts immediately. I don't know exactly why but first invocation of compiler is somehow affected by state of the jvm and this cause it to produce slightly different bytecode.

Solution was to add <fork>true<fork> to my pom.xml file and build now is totally reproducible... though takes ~2x more time :)

I still believe that this should not happen even if initial compilation happened inside maven jvm, and it may still be topic for improvement in javac but this "workaround" is acceptable.

AngryGami
  • 25
  • 3