52

There are various articles out there on the interwebs that try to empirically estimate the overhead of a java.lang.Object in particular JVM implementations. For example, I've seen the size overhead of a bare Object estimated at 8 bytes in some JVMs.

What I would like to know is whether a typical JVM implementation of the extends relationship introduces incremental size overhead at every level of the class hierarchy. In other words, suppose you have a class hierarchy with N levels of subclasses. Is the overhead of the in-memory representation of a class instance O(1) or O(N)?

I imagine it is O(1) because although the size of some of the hidden fluffy stuff you need to be a Java Object (vtable, chain of classes) will grow as the inheritance hierarchy grows, they grow per-class, not per-instance, and the JVM implementation can store constant-size pointers to these entities in a constant-size header attached to every Object.

So in theory, the overhead directly attached to the in-memory representation of any Java object should be O(1) for inheritance depth N. Does anyone know if it's true in practice?

0xbe5077ed
  • 4,565
  • 6
  • 35
  • 77
  • What problem are you trying to solve? Because if you want to use Java, and you can't use primitive types (which you should if you can); then you're left with inheritance. I would expect it's O(N) where N is the inheritance tree length. – Elliott Frisch May 30 '14 at 06:04
  • What would be the *benefit* of making it O(N)? I can't think of a single benefit... – user541686 May 30 '14 at 06:13
  • @BrunoReis That was my first question as well. – Elliott Frisch May 30 '14 at 06:15
  • 2
    @Merhdad Nobody said there would be a 'benefit', and nobody has stated or implied that there is a *choice* about it. – user207421 May 30 '14 at 06:17
  • 2
    I wouldn't worry about the instance memory (I think most JVMs just store a reference to the class, anyway, then do a lookup for inheritance). In practice, you'll get performance issues because dispatching inherited methods isn't necessarily cheap, and you'll pay for this in memory when the JVM inlines the method. – David Ehrmann May 30 '14 at 06:17
  • There could be some sort of overhead due to field layout/padding. I wouldn't be surprised if a class with two byte-typed fields, one declared and one inherited, ended up using two machine words to store them instead of a single one. – Trillian May 30 '14 at 15:52
  • Fair point, @Trillian, and I expect you are right. However, I'm asking more about about overhead intrinsic to the way inheritance is designed in a given JVM implementation than stuff that is related to data alignment. – 0xbe5077ed May 30 '14 at 16:37

5 Answers5

25

When in doubt, look at the source (well, a source; each JVM is free to choose how to do it, as the standard does not mandate any internal representation). So I had a look, and found the following comment within the implementation of JDK 7-u60's hotspot JVM:

// A Klass is the part of the klassOop that provides:
//  1: language level class object (method dictionary etc.)
//  2: provide vm dispatch behavior for the object
// Both functions are combined into one C++ class. The toplevel class "Klass"
// implements purpose 1 whereas all subclasses provide extra virtual functions
// for purpose 2.

// One reason for the oop/klass dichotomy in the implementation is
// that we don't want a C++ vtbl pointer in every object.  Thus,
// normal oops don't have any virtual functions.  Instead, they
// forward all "virtual" functions to their klass, which does have
// a vtbl and does the C++ dispatch depending on the object's

The way I read it, this means that, for this (very popular) implementation, object instances only store a pointer to their class. The cost, per-instance of a class, of that class having a longer or shorter inheritance chain is effectively 0. The classes themselves do take up space in memory though (but only once per class). Run-time efficiency of deep inheritance chains is another matter.

tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • Is there an O(N)-ness to, say, the *time cost* of virtual method invocation? I suspect there is for `instanceof` and casting expressions... – 0xbe5077ed May 30 '14 at 15:31
  • 1
    @0xbe5077ed The JVM could store the class relationships as a graph and do the transitive closure to obtain O(1) `instanceof` implementation... at the cost of a bigger graph (it might have O(N^2) arcs...). Generally the depth of inheritance chains doesn't grow with `N`, or grows with a very small proportionality, so that the graph wouldn't, usually, be that big. – Bakuriu May 31 '14 at 08:58
  • 2
    @Bakuriu: many years ago we (my colleagues and I) discovered that we actually *needed* a trick for `instanceof`, although we didn't use that exact trick. The reason is that we didn't want to run out of stack while computing an `instanceof` during exception handling. That wouldn't be a problem with single-inheritance, a loop with fixed stack use would do it. But multiple inheritance (interfaces) means that a simple tree search requires much stack in the worst case. I don't know what Sun did at the time: our license with them meant we could only crib their code under controlled circumstances. – Steve Jessop Jun 02 '14 at 10:32
24

The JVM specification states

The Java Virtual Machine does not mandate any particular internal structure for objects.

So the specification does not care how you do it. But...

In some of Oracle’s implementations of the Java Virtual Machine, a reference to a class instance is a pointer to a handle that is itself a pair of pointers: one to a table containing the methods of the object and a pointer to the Class object that represents the type of the object, and the other to the memory allocated from the heap for the object data.

So in typical Oracle implementations it is O(1) for methods. This method table is the Method Area which is per class.

The Java Virtual Machine has a method area that is shared among all Java Virtual Machine threads. The method area is analogous to the storage area for compiled code of a conventional language or analogous to the "text" segment in an operating system process. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors, including the special methods (§2.9) used in class and instance initialization and interface initialization.

Also, about method entries

The method_info structures represent all methods declared by this class or interface type, including instance methods, class methods, instance initialization methods (§2.9), and any class or interface initialization method (§2.9). The methods table does not include items representing methods that are inherited from superclasses or superinterfaces.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • 1
    In other words, in the most popular implementation, the answer is yes. There is no extra space overhead, per-instance, due to longer or shorter inheritance chains for the instances' class. – tucuxi May 30 '14 at 09:06
  • I'd bet that OpenJDK will be the same. – Donal Fellows May 30 '14 at 12:36
  • 2
    Worth noting that although there's no in-memory cost to inheritance, there is additional data in the serialised form, as it describes the entire type hierarchy of the object. – MikeFHay May 30 '14 at 12:59
4

An instance generally requires the following data, although it's up to the implementation exactly what to do:

  • the instance fields of the class and its parent classes, which I assume you don't mean to include in the term "overhead"
  • some means to lock the object
  • if the garbage collector relocates objects, then some means to record the original hash of the object (for Object.hashCode)
  • some means to access type information

As you guess in your question, in a "normal" Java implementation the type information is stored per-class, not per instance. Part of the definition of "type" is that two instances of the same class necessarily have the same type information, there's no obvious reason not to share it. So you would expect that per-instance overhead to be constant, not to depend on the class hierarchy.

That is to say, adding extra empty classes or interfaces to a class should not increase the size of its instances. I don't think either the language or the JVM spec actually guarantees this, though, so don't make too many assumptions about what a "non-normal" Java implementation is allowed to do.

As an aside, the second and third things on my list can be combined via cunning trickery, so that both of them together are a single pointer. The article you link to refers to references taking 4 bytes, so the 8 bytes it comes up with for an object is one pointer to type information, one field containing either a hashcode or a pointer to a monitor, and probably some flags in the lowest 2 bits of one or both of those pointer fields. Object will (you'd expect) be larger on a 64-bit Java.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
2

Double and Integer, which extend Number, which extends Object, do not have an O(n) behavior, that is, an Integer is not 3X the size of an Object, so I think the answer is O(1). e.g. see this old SO question

Community
  • 1
  • 1
user949300
  • 15,364
  • 7
  • 35
  • 66
  • 1
    they are also generally auto-boxed/unboxed (when using them as atoms and and not actual objects), and heavily optimized by implementations, so they may not be the most representative examples. The linked question refers to Java 1.3, before auto-boxing. – tucuxi May 30 '14 at 08:58
1

in theory, the overhead directly attached to the in-memory representation of any Java object should be O(1) for inheritance depth N. Does anyone know if it's true in practice?

It can't be O(1) unless there are zero instance members at every level. Every instance member requires space per instance.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • I thought the question was about the depth in the inheritance tree from java.lang.Object – Elliott Frisch May 30 '14 at 06:06
  • @ElliotFrisch "Is the overhead of the in-memory representation of a class instance O(1) or O(N)?". – user207421 May 30 '14 at 06:07
  • I re-read your answer... I think it must then be some O(N*log k) where N is the depth in the tree and (log k) accounts for the member variables (assuming that's actually logarithmic in practice). – Elliott Frisch May 30 '14 at 06:10
  • @ElliotFrisch If that's your answer, post it. But I don't understand where you get this *log(k)* stuff from, or how space for member variables can be 'logarithmic in practice', whatever that means. – user207421 May 30 '14 at 06:15
  • It's not an answer. To OPs question, is this cost at runtime or cost in some other sense? At runtime, you have to account for every member variable of the current Object (and all of the members for each parent) up the chain.... or is it O(N) and that's considered as amortization across the instantiation.... – Elliott Frisch May 30 '14 at 06:18
  • O(1) when referring to "space per-instance when the inheritance chain has length N" is perfectly reasonable. OP is asking for "space per-instance", and not "space for having more instances", which would indeed grow linearly. – tucuxi May 30 '14 at 09:03
  • It's entirely reasonable to have space per object for the fields of the classes in the inheritance hierarchy. In fact, it would be downright strange not to do that. But that isn't _overhead_. The space overhead is the cost of everything else, especially so that an object can know its class and be lockable; a naïve implementation would store those inside `Object` but other answers indicate that modern Java implementations are better than that… – Donal Fellows May 30 '14 at 12:42