95

When using the same JDK (i.e. the same javac executable), are the generated class files always identical? Can there be a difference depending on the operating system or hardware? Except of the JDK version, could there be any other factors resulting in differences? Are there any compiler options to avoid differences? Is a difference only possibly in theory or does Oracle's javac actually produce different class files for the same input and compiler options?

Update 1 I'm interested in the generation, i.e. compiler output, not whether a class file can be run on various platforms.

Update 2 By 'Same JDK', I also mean the same javac executable.

Update 3 Distinction between theoretical difference and practical difference in Oracle's compilers.

[EDIT, adding paraphrased question]
"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"

mstrap
  • 16,808
  • 10
  • 56
  • 86
  • My question is about the creation, not whether they can be used on all platforms. – mstrap Feb 20 '13 at 16:35
  • 5
    @Gamb CORA does *not* mean that the byte code will be exactly the same if compiled on different platforms; all it means is that the generated byte code will do exactly the same thing. – Sergey Kalinichenko Feb 20 '13 at 16:35
  • @Adel Do you have a reference for that? I know that it is [definitely **not** the case for C#](http://blogs.msdn.com/b/ericlippert/archive/2012/05/31/past-performance-is-no-guarantee-of-future-results.aspx), so would love to see a reference stating it is the case for Java. I'm particularly thinking that a multi-threaded compiler might assign different identifier names on different runs. – RB. Feb 20 '13 at 16:38
  • 11
    Why do you care? This smells like [a XY Problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – Joachim Sauer Feb 20 '13 at 16:41
  • The interface that a developer uses is the same for all platforms. The underlying binary is different for each platform. In other words, you can run the same java program anywhere but you cannot take the java binaries (the contents of jdk/bin folder) meant for one OS and put it on another. – chettyharish Feb 20 '13 at 16:43
  • 4
    @JoachimSauer Consider if you version control your binaries - you might want to detect changes only if the source code had changed, but you would know this was not a sensible idea if the JDK can arbitrarily change the output binaries. – RB. Feb 20 '13 at 16:44
  • 7
    @RB.: the compiler is allowed to produce any conforming byte code that represents the compiled code. In fact, some compiler updates fix bugs that produce slightly different code (usually with the same runtime behaviour). In other words: if you want to *detect* source changes, *check for* source changes. – Joachim Sauer Feb 20 '13 at 16:48
  • It looks like you can find a verified answer if you have access to the [expert exchange site](http://www.experts-exchange.com/Programming/Languages/Java/Q_20678283.html) (which I do not have). – Sergey Kalinichenko Feb 20 '13 at 16:53
  • 3
    @dasblinkenlight: you're assuming that the answer that they claim to have is actually correct and up-do-date (doubtful, given that the question is from 2003). – Joachim Sauer Feb 20 '13 at 17:02
  • 1
    Is another way to ask your question, "What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?" For example, one that uses an AMD CPU vs. another that uses an Intel CPU. – Kelly S. French Feb 20 '13 at 17:20
  • @Kelly S. French Yes. – mstrap Feb 20 '13 at 17:22
  • will be interesting see how somebody includes http://en.wikipedia.org/wiki/HotSpot in his answer – osdamv Feb 20 '13 at 18:26
  • Great question! This seems strangely related to the Halting Problem. Essentially, you can't prove a negative. – Kelly S. French Mar 04 '13 at 15:21

11 Answers11

70

Let's put it this way:

I can easily produce an entirely conforming Java compiler that never produces the same .class file twice, given the same .java file.

I could do this by tweaking all kinds of bytecode construction or by simply adding superfluous attributes to my method (which is allowed).

Given that the specification does not require the compiler to produce byte-for-byte identical class files, I'd avoid depending such a result.

However, the few times that I've checked, compiling the same source file with the same compiler with the same switches (and the same libraries!) did result in the same .class files.

Update: I've recently stumbled over this interesting blog post about the implementation of switch on String in Java 7. In this blog post, there are some relevant parts, that I'll quote here (emphasis mine):

In order to make the compiler's output predictable and repeatable, the maps and sets used in these data structures are LinkedHashMaps and LinkedHashSets rather than just HashMaps and HashSets. In terms of functional correctness of code generated during a given compile, using HashMap and HashSet would be fine; the iteration order does not matter. However, we find it beneficial to have javac's output not vary based on implementation details of system classes .

This pretty clearly illustrates the issue: The compiler is not required to act in a deterministic manner, as long as it matches the spec. The compiler developers, however, realize that it's generally a good idea to try (provided it's not too expensive, probably).

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • @GaborSch what is it missing? "What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?" basically depending on the whim of the group that produced the compiler – emory Feb 21 '13 at 00:32
  • @emory Probably you missed, but that was the first answer. Then OP updated the question, because the answer was too *theoretical*. After **updating** OP is asking **practical example**, but I cannot find any traces of that here. Of course *theoretically* I can write any compiler, but I *will not* write because *there is no sense* doing so. The answer **does not show** examples. – gaborsch Feb 21 '13 at 00:43
  • 3
    Well, for me this would be reason enough not to depend on it: an updated JDK could break my build/archival system if I depended on the fact that the compiler always produces the same code. – Joachim Sauer Feb 21 '13 at 08:08
  • 3
    @GaborSch: you already have a perfectly fine example of such a situation, so some additional view on the problem was in order. There's no sense in duplicating your work. – Joachim Sauer Feb 21 '13 at 09:00
  • @JoachimSauer I agree, you should never depend on the bytecode created. I suppose that the root problem was that OP wanted to move the build environment from one OS to another, and he was curious about the possible side-effects - even theoretical and practical. – gaborsch Feb 21 '13 at 09:16
  • 1
    @GaborSch The root problem is that we want to implement an efficient "online update" of our application for which users would only fetch modified JARs from the website. I can create identical JARs having identical class files as input. But the question is whether class files are always identical when compiled from the same source files. Our whole concept stands and fails with this fact. – mstrap Feb 21 '13 at 09:57
  • 2
    @mstrap: so it is an XY Problem after all. Well, you can look into differential updates of jars (so even one-byte-differences would not cause the whole jar to be redownloaded) and you should provide explicit version numbers to your releases anyway, so that whole point is moot, in my opinion. – Joachim Sauer Feb 21 '13 at 10:03
  • @mstrap If you have an incremental build system which compiles only the modified `.java` files, you only have a few `.class` files modified. That would be a good building and deployment process. Don't rely on the complier. If the cass files are so precious artifacts for you, maybe you should consider put them into version control (which is quite a bad practice otherwise, as @DonalFellows pointed out) – gaborsch Feb 21 '13 at 10:14
  • @JoachimSauer We have build numbers and release numbers, but that doesn't solve the problem: build 58 checks the update server, sees build 59 and detects 5 JAR files with different hash code, i.e. needs to pull these 5 JAR files. – mstrap Feb 21 '13 at 13:41
  • @mstrap: so? the build-server should be able to generate and provide a binary diff of the 5 jars between build 58 and 59, which the client can download and apply. This way little differences don't hurt a lot and your system becomes much more stable and resilient to small changed. There are many libraries that implement differential compression (think "rsync") in Java. – Joachim Sauer Feb 21 '13 at 13:42
  • @GaborSch We don't have an incremental build system and class files are not versioned at all. They are not precious, I just want to make sure, that class files for the same sources remain identical (if there is anything I can do about that -- that's why I was asking for compiler options and anything else I would not even have though about). – mstrap Feb 21 '13 at 13:44
  • @JoachimSauer In first place I want to avoid that JAR files for the same sources are different. As we have large parts of the codebase which only seldomly changes, this will reduce the amount of 'changed' JAR files per build significantly. After that, we could consider to send only compressed differences. I'm not sure whether this will work well for JAR files as small changes in the source code may result in completely changed JAR (ZIP) files, but definitely worth a try. – mstrap Feb 21 '13 at 13:47
  • @mstrap Hard job, you'd better change your build system. If you want to tweak the compiler options, you are touching this field anyway. – gaborsch Feb 21 '13 at 13:50
  • @mstrap: but if the source doesn't change, why is there a separate release from that part of the code? It sounds like you have on large group of source files with no internal structure (or at least no producing separate jar files) ... – Joachim Sauer Feb 21 '13 at 13:53
  • @JoachimSauer your suggestion is interesting (to send the changed classes only, packaged into 1 `diff_59.jar`), but I think - considering maintainability aspects - it is better to send whole packages (think about class redefinitions, class deletions). – gaborsch Feb 21 '13 at 13:58
  • @GaborSch: I'm not talking about class-level granularity. Just push the old and new jar through the rsync algorithm and provide the result. Then you can reconstruct a byte-for-byte copy of the new jar on the client that is to be updated. (i.e. send `someJar_58_59.patch`). – Joachim Sauer Feb 21 '13 at 14:09
39

There is no obligation for the compilers to produce the same bytecode on each platform. You should consult the different vendors' javac utility to have a specific answer.


I will show a practical example for this with file ordering.

Let's say that we have 2 jar files: my1.jar and My2.jar. They're put in the lib directory, side-by-side. The compiler reads them in alphabetical order (since this is lib), but the order is my1.jar, My2.jar when the file system is case insensitive , and My2.jar, my1.jar if it is case sensitive.

The my1.jar has a class A.class with a method

public class A {
     public static void a(String s) {}
}

The My2.jar has the same A.class, but with different method signature (accepts Object):

public class A {
     public static void a(Object o) {}
}

It is clear that if you have a call

String s = "x"; 
A.a(s); 

it will compile a method call with different signature in different cases. So, depending on your filesystem case sensitiveness, you will get different class as a result.

gaborsch
  • 15,408
  • 6
  • 37
  • 48
  • 1
    +1 There are myriad differences between the Eclipse compiler and javac, for example [how synthetic constructors are generated](http://stackoverflow.com/questions/14266052/issue-with-constructors-of-nested-class/14267348#14267348). – Paul Bellora Feb 20 '13 at 16:47
  • 2
    @GaborSch I'm interested in whether the byte code is identical for the same JDK, i.e. the same javac. I'll make that clearer. – mstrap Feb 20 '13 at 16:49
  • 2
    @mstrap I understood your question, but the answer is still the same: depends on on the vendor. The `javac` is not the same, because you have different binaries on each platform (e.g. Win7, Linux, Solaris, Mac). For a vendor, it does not make sense to have different implementations, but any platform specific issue can influence the result (e.g. flie ordering in a directory (think on your `lib` directory), endianness, etc). – gaborsch Feb 20 '13 at 16:53
  • 1
    Usually, most of `javac` is implemented in Java (and `javac` is just a simple native launcher), so *most* platform differences should have no impact. – Joachim Sauer Feb 20 '13 at 16:57
  • 2
    @mstrap - the point he is making is that there is no *requirement* for any vendor to make their compiler produce exactly the same bytecode across platforms, only that the resulting bytecode produces the same results. Given there is no standard/spec/requirement the answer to your question is "It depends on the specific vendor, compiler, and platform". – Brian Roach Feb 20 '13 at 16:57
  • @JoachimSauer That is still not a requirement to implement `javac` in Java. – gaborsch Feb 20 '13 at 16:58
  • It is clear that there are no obligations to generate same bytecode. But here are also (apparently) no reasons to generate different. `javac` is written in Java, so it it the same for all platforms, files are also the same, and the result probably also the same, except JNI or something – Suzan Cioc Feb 20 '13 at 16:59
  • 1
    @GaborSch: of course there's no requirement, but you mentioned different binaries, so I wanted to mention the fact that *usually* the Java compiler is actually implemented in Java itself. – Joachim Sauer Feb 20 '13 at 17:01
  • @SuzanCioc I also told the same. But there is an implementation, and - because compilation is quite complex - it depends on many factors, e.g. file order. – gaborsch Feb 20 '13 at 17:03
  • 1
    @JoachimSauer you are right, +1 for the comment, but there is a binary code somewhere deep (a JRE in our case), which is written in C, uses platform specific code, etc. Usually we don't suspect that this would influence the code generation, but there is a (not so) theoretical possibility for that. – gaborsch Feb 20 '13 at 17:06
  • @GaborSch the same JDK means the same vendor and compiler. The point about the platform is interesting though. Has anyone seen Oracles JDK 1.7.0_11 produce a different class file on Windows than on Linux? – mstrap Feb 20 '13 at 17:10
  • 1
    @mstrap You asked the theoretical question, so in theory the answer is **no**. But *in practice* most vendors produce the same code, but there is no guarantee for that. How much would you stake that the first version of JDK8 will produce the same bytecode on all platforms? – gaborsch Feb 20 '13 at 17:19
  • 1
    @GaborSch Good point. Actually, I'm more interested in the practical answer. I've once again updated my question regarding this aspect. – mstrap Feb 20 '13 at 17:32
  • @mstrap Updated my answer with the example. – gaborsch Feb 20 '13 at 17:57
6

Short Answer - NO


Long Answer

They bytecode need not be the same for different platform. It's the JRE (Java Runtime Environment) which know how exactly to execute the bytecode.

If you go through the Java VM specification you'll come to know that this needs not to be true that the bytecode is same for different platforms.

Going through the class file format, it shows the structure of a class file as

ClassFile {
    u4 magic;
    u2 minor_version;
    u2 major_version;
    u2 constant_pool_count;
    cp_info constant_pool[constant_pool_count-1];
    u2 access_flags;
    u2 this_class;
    u2 super_class;
    u2 interfaces_count;
    u2 interfaces[interfaces_count];
    u2 fields_count;
    field_info fields[fields_count];
    u2 methods_count;
    method_info methods[methods_count];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

Checking about the minor and major version

minor_version, major_version

The values of the minor_version and major_version items are the minor and major version numbers of this class file.Together, a major and a minor version number determine the version of the class file format. If a class file has major version number M and minor version number m, we denote the version of its class file format as M.m. Thus, class file format versions may be ordered lexicographically, for example, 1.5 < 2.0 < 2.1. A Java virtual machine implementation can support a class file format of version v if and only if v lies in some contiguous range Mi.0 v Mj.m. Only Sun can specify what range of versions a Java virtual machine implementation conforming to a certain release level of the Java platform may support.1

Reading more through the footnotes

1 The Java virtual machine implementation of Sun's JDK release 1.0.2 supports class file format versions 45.0 through 45.3 inclusive. Sun's JDK releases 1.1.X can support class file formats of versions in the range 45.0 through 45.65535 inclusive. Implementations of version 1.2 of the Java 2 platform can support class file formats of versions in the range 45.0 through 46.0 inclusive.

So, investigating all this shows that the class files generated on different platforms need not be identical.

Community
  • 1
  • 1
mtk
  • 13,221
  • 16
  • 72
  • 112
  • Can you give a more detailed link please? – mstrap Feb 20 '13 at 16:58
  • I think by 'platform' they are referring to the Java platform, not the operating system. Of course, when instructing javac 1.7 to create 1.6-compatible class files, there will be a difference. – mstrap Feb 20 '13 at 17:17
  • @mtk +1 to show how many properties are generated for a single class during the compilation. – gaborsch Feb 21 '13 at 12:02
3

Firstly, there's absolutely no such guarantee in the spec. A conforming compiler could stamp the time of compilation into the generated class file as an additional (custom) attribute, and the class file would still be correct. It would however produce a byte-level different file on every single build, and trivially so.

Secondly, even without such nasty tricks about, there's no reason to expect a compiler to do exactly the same thing twice in a row unless both its configuration and its input are identical in the two cases. The spec does describe the source filename as one of the standard attributes, and adding blank lines to the source file could well change the line number table.

Thirdly, I've never encountered any difference in build due to the host platform (other than that which was attributable to differences in what was on the classpath). The code which would vary based on platform (i.e., native code libraries) isn't part of the class file, and the actual generation of native code from the bytecode happens after the class is loaded.

Fourthly (and most importantly) it reeks of a bad process smell (like a code smell, but for how you act on the code) to want to know this. Version the source if possible, not the build, and if you do need to version the build, version at the whole-component level and not on individual class files. For preference, use a CI server (such as Jenkins) to manage the process of turning source into runnable code.

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
2

I believe that, if you use the same JDK, the generated byte code will always be the same, without relation with the harware and OS used. The byte code production is done by the java compiler, that uses a deterministic algorithm to "transform" the source code into byte code. So, the output will always be the same. In these conditions, only a update on the source code will affect the output.

viniciusjssouza
  • 1,235
  • 14
  • 28
  • 3
    Do you have a reference for this though? As I said in the question comments, [this is definitely **not** the case for C#](http://blogs.msdn.com/b/ericlippert/archive/2012/05/31/past-performance-is-no-guarantee-of-future-results.aspx), so would love to see a reference stating it is the case for Java. I'm particularly thinking that a multi-threaded compiler might assign different identifier names on different runs. – RB. Feb 20 '13 at 16:46
  • 1
    This is the answer to my question and what I'd expect, however I agree with RB that a reference for that would be important. – mstrap Feb 20 '13 at 16:53
  • I believe the same. I don't think you will find a definitive reference. If it is important to you then you can do a study. Collect a bunch of the leading ones and try them out on different platforms compiling some open source code. Compare the byte files. Publish the result. Be sure to put a link here. – emory Feb 21 '13 at 00:36
1

Java allows you write/compile code on one platform and run on different platform. AFAIK; this will be possible only when class file generated on different platform is same or technically same i.e. identical.

Edit

What i mean by technically same comment is that. They don't need to be exactly same if you compare byte by byte.

So as per specification .class file of a class on different platforms don't need to match byte-by-byte.

rai.skumar
  • 10,309
  • 6
  • 39
  • 55
  • I'm interested whether they are *identical*. – mstrap Feb 20 '13 at 16:55
  • The OP's question *was* whether the class files were the same or "technically the same". – bdesham Feb 20 '13 at 16:55
  • and reply is yes. what i mean is they might not be same if you compare byte by byte, thats why i used the word technically same. – rai.skumar Feb 20 '13 at 17:05
  • @bdesham he wanted to know if they are identical. not sure what you understood by "technically the same"...is that the reason for downvote ? – rai.skumar Feb 20 '13 at 17:09
  • @rai.skumar Your answer basically says, "Two compilers will always produce output that behaves the same." Of course this is true; it's the whole motivation of the Java platform. The OP wanted to know whether the emitted code was *byte for byte identical*, which you did not address in your answer. – bdesham Feb 20 '13 at 17:13
  • i think i have explained in my first comment about same...my bad if my post doesn't say it explicitly. – rai.skumar Feb 20 '13 at 17:18
1

Overall, I'd have to say there is no guarantee that the same source will produce the same bytecode when compiled by the same compiler but on a different platform.

I'd look into scenarios involving different languages (code-pages), for example Windows with Japanese language support. Think multi-byte characters; unless the compiler always assumes it needs to support all languages it might optimize for 8-bit ASCII.

There is a section on binary compatibility in the Java Language Specification.

Within the framework of Release-to-Release Binary Compatibility in SOM (Forman, Conner, Danforth, and Raper, Proceedings of OOPSLA '95), Java programming language binaries are binary compatible under all relevant transformations that the authors identify (with some caveats with respect to the addition of instance variables). Using their scheme, here is a list of some important binary compatible changes that the Java programming language supports:

•Reimplementing existing methods, constructors, and initializers to improve performance.

•Changing methods or constructors to return values on inputs for which they previously either threw exceptions that normally should not occur or failed by going into an infinite loop or causing a deadlock.

•Adding new fields, methods, or constructors to an existing class or interface.

•Deleting private fields, methods, or constructors of a class.

•When an entire package is updated, deleting default (package-only) access fields, methods, or constructors of classes and interfaces in the package.

•Reordering the fields, methods, or constructors in an existing type declaration.

•Moving a method upward in the class hierarchy.

•Reordering the list of direct superinterfaces of a class or interface.

•Inserting new class or interface types in the type hierarchy.

This chapter specifies minimum standards for binary compatibility guaranteed by all implementations. The Java programming language guarantees compatibility when binaries of classes and interfaces are mixed that are not known to be from compatible sources, but whose sources have been modified in the compatible ways described here. Note that we are discussing compatibility between releases of an application. A discussion of compatibility among releases of the Java SE platform is beyond the scope of this chapter.

Kelly S. French
  • 12,198
  • 10
  • 63
  • 93
  • That article discusses what can happen in we change the Java version. The OP's question was what can happen if we change platform within the same Java version. Otherwise it's a good catch. – gaborsch Feb 20 '13 at 18:24
  • 1
    It's as close as I could find. There is an odd hole between the spec of the language and the spec of the JVM. So far, I'd have to answer the OP with 'there is no guarantee that the same java compiler will produce the same bytecode when run on a different platform.' – Kelly S. French Feb 20 '13 at 18:30
1

For the question:

"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"

The Cross-Compilation example shows how we can use the Javac option:-target version

This flag generates class files which are compatible with the Java version we specify while invoking this command. Hence the class files will differ depending on the attributes we supply during the compaliation using this option.

0

Most probably, the answer is "yes", but to have precise answer, one does need to search for some keys or guid generation during compiling.

I can't remember the situation where this occurs. For example to have ID for serializing purposes it is hardcoded, i.e. generated by programmer or IDE.

P.S. Also JNI can matter.

P.P.S. I found that javac is itself written in java. This means that it is identical on different platforms. Hence it would not generate different code without a reason. So, it can do this only with native calls.

Suzan Cioc
  • 29,281
  • 63
  • 213
  • 385
  • Note that Java does not shield you from *all* platform differences. The order of files returned when listing directory content is not defined, and this *could* conceivably have some impact on a compiler. – Joachim Sauer Feb 20 '13 at 16:58
0

I would put it another way.

First, I think the question is not about being deterministic:

Of course it is deterministic: randomness is hard to achieve in computer science, and there is no reason a compiler would introduce it here for any reason.

Second, if you reformulate it by "how similar are bytecode files for a same sourcecode file ?", then No, you can't rely on the fact that they will be similar.

A good way of making sure of this, is by leaving the .class (or .pyc in my case) in your git stage. You'll realize that among different computers in your team, git notices changes between .pyc files, when no changes were brought to the .py file (and .pyc recompiled anyway).

At least that's what I observed. So put *.pyc and *.class in your .gitignore !

Augustin Riedinger
  • 20,909
  • 29
  • 133
  • 206
0

There are two questions.

Can there be a difference depending on the operating system or hardware? 

This is a theoretical question, and the answer is clearly, yes, there can be. As others have said, the specification does not require the compiler to produce byte-for-byte identical class files.

Even if every compiler currently in existence produced the same byte code in all circumstances (different hardware, etc.), the answer tomorrow might be different. If you never plan to update javac or your operating system, you could test that version's behavior in your particular circumstances, but the results might be different if you go from, for example, Java 7 Update 11 to Java 7 Update 15.

What are the circumstances where the same javac executable, when run on a different platform, will produce different bytecode?

That's unknowable.

I don't know if configuration management is your reason for asking the question, but it's an understandable reason to care. Comparing byte codes is a legitimate IT control, but only to determine if the class files changed, not top determine if the source files did.