0

Sorry if this is a really stupid question, but hearing as "Java arrays are literally just Objects" it makes no sense to me that they need to have a pre-defined length?

I understand why primitive types do, for example int myInt = 15; allocates 32 bits of memory to store an integer and that makes sense to me. But if I had the following code:

class Integer{
    int myValue;

    public Integer(int myValue){
        this.myValue = myValue;
    }
}

Followed by a Integer myInteger = new Integer(15);myInteger.myValue = 5; then there's no limit on the amount of data I can store in myInteger. It's not limited to 32 bits, but rather it's a pointer to an Object which can store any amount of ints, doubles, Strings, or really anything. It allocated 32 bits of memory to store the pointer, but the object itself can store any amount of data, and it doesn't need to be specified beforehand.

So why can't an array do that? Why do I need to tell an array how much memory to allocate beforehand? If an array is "literally just an object" then why can't I simply say String[] myStrings = new String[];myStrings[0] = "Something";?

I'm super new to Java so there's a 100% chance that this is a stupid question and that there's a very simple and clear answer, but I am curious.

Also, to give another example, I can say ArrayList<String> myStrings = new ArrayList<String>();myStrings.add("Something"); without any problem... So what makes an ArrayList different from an array? Why does an array NEED to be told how much memory to allocate when an ArrayList doesn't?

Thanks in advance to anybody who takes the time to fill me in. :)

EDIT: Okay, so far everybody in the comments have misunderstood my post and I feel like it's my fault for wording it wrong. My question is not "how do I define an array?", or "does changing the value of a variable change its memory usage?", or "do pointers store the data of the object they point to?", or "are arrays objects?", nor is it "how to ArrayLists work?" My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront? (With ArrayLists being an example of the difference)

I hope this makes more sense now... I'm not sure why everybody misunderstood? (Did I word something wrong? If so, let me know and I'll change it for others' convenience)

OOPS Studio
  • 732
  • 1
  • 8
  • 25
  • Java arrays are objects `Object o = new int[] {1,2,3};` - that is an array with a length of 3. Why? Because it can hold three elements. No more. No less. `Object o = new int[4];` is an array of four elements (all `0`). – Elliott Frisch Jan 03 '21 at 03:49
  • Automobiles and airplanes both "vehicles". However, it does not automatically follow that my car can fly. – Kevin Anderson Jan 03 '21 at 03:56
  • "then there's no limit on the amount of data I can store in myInteger". Really? Show me how you can store the integer 2^50 in it. (Note: you need more than 32 bits to do that). I'm pretty sure it can only store an `int`. – Sweeper Jan 03 '21 at 03:58
  • I have a feeling that the OP is confusing the number 32 with 32 bits. But there are so many (facially) incorrect facts and assumptions, it is hard know whether the OP's problems are in his knowledge or his written expression of his knowledge. – Stephen C Jan 03 '21 at 04:00
  • Ah, yeah. I shouldn't have assigned the `int`s the value of `32`, that just confused things. I was using 32 just as a placeholder number, but now everybody thinks I put that there as a means of representing 32 bits....... I'll edit that. – OOPS Studio Jan 03 '21 at 04:30
  • @Sweeper Yes, I'm aware. You misunderstood my question. I'm not saying that the larger the number stored in 32 bits, the more memory it takes up. What I'm saying is that the object `myInteger` can hold any amount of ints, doubles, floats, Strings, etc without me having to tell it how many it will hold beforehand, however an array needs to be told how much memory it will hold before it will hold anything. My question is why is that the case? (As much as I appreciate you being rude, it doesn't answer my question...) – OOPS Studio Jan 03 '21 at 04:42
  • Your statement _can hold any amount of ints, [...]_ is wrong or unclear. The object referenced by the `myInteger` variable has a single field of type `int` called `myValue`. It can only _hold_ the amount of memory needed for that field, ie 4 bytes. – Sotirios Delimanolis Jan 03 '21 at 04:43
  • @SotiriosDelimanolis That's the first good response my question has received. Thank you for actually pointing out the issue and telling me why it's wrong rather than just mocking me based on a misunderstanding. So, if I understand correctly, when I create an object it DOES, in fact, have a "byte limit", but it's generated automatically by reading the variables/methods inside the class? If so, then that makes sense to me... But it still doesn't explain why an ArrayList works differently from an array? Why can an ArrayList hold a flexible amount of data when an array can't? – OOPS Studio Jan 03 '21 at 04:46
  • @OOPSStudio See https://stackoverflow.com/questions/4450628/arraylist-how-does-the-size-increase – Sotirios Delimanolis Jan 03 '21 at 04:47
  • As I understand it, fields only hold references to the object (i.e. a pointer). I would assume the JVM knows how much memory is needed to hold a pointer. And the size of each primitive is well-defined and known by the JVM. Now, I'm in no way familiar with how the JVM represents objects in memory (and discounting different implementations), but I would assume that means the JVM knows how much memory to allocate when creating a new instance of a given class. – Slaw Jan 03 '21 at 04:48
  • As for array vs `ArrayList`, the `ArrayList` automatically creates a _new, larger_ array as needed. This, I assume, does not affect the memory needed by the `ArrayList` instance directly because all that's happening is that the _reference_ to the old array is being updated to a _reference_ to the new array. – Slaw Jan 03 '21 at 04:49
  • @Slaw Then what makes an ArrayList any more efficient than just `String[] strings = new String[1];strings[0] = "something";String newStrings = new String[2];newStrings[0] = strings[0];newStrings[1] = "something2"` ? What's the point of an ArrayList then as opposed to just manually creating/destroying arrays? – OOPS Studio Jan 03 '21 at 04:53
  • So you don't have to "manually" create and destroy (throw away) arrays. It also wraps the array with the `List` API, giving you all sorts of operations implemented for you. – Slaw Jan 03 '21 at 04:54
  • @Slaw Hmm... I see, thank you. I guess ArrayLists aren't as interesting as I thought. I will go do some more research into this. Thanks! – OOPS Studio Jan 03 '21 at 04:59

5 Answers5

3

My question is why does a pointer to an array need to know how big the array is beforehand, when a pointer to any other object doesn't?

It doesn't. Here, this runs perfectly fine:

String[] x = new String[10];
x = new String[15];

The whole 'needs to know in advance how large it is' refers only to the ARRAY OBJECT. As in, new int[10] goes to the heap, which is like a giant beach, and creates a new treasure chest out of thin air, big enough to hold exactly 10 ints (Which, being primitives, are like coins in this example). It then buries it in the sand, lost forever. Hence why new int[10]; all by its lonesome is quite useless.

When you write int[] arr = new int[10];, you still do that, but you now also make a treasure map. X marks the spot. 'arr' is this map. It is NOT AN INT ARRAY. It is a map to an int array. In java, both [] and . are 'follow the map, dig down, and open er up'.

arr[5] = 10; means: Follow your arr map, dig down, open up the chest you find there, and you'll see it has room for precisely 10 little pouches, each pouch large enough to hold one coin. Take the 6th pouch. Remove whatever was there, put a 10ct coin in.

It's not the map that needs to know how large the chest is that the map leads to. It's the chest itself. And this is true for objects as well, it is not possible in java to make a treasure chest that can arbitrarily resize itself.

So how does ArrayList work?

Maps-in-boxes.

ArrayList has, internally, a field of type Object[]. That field doesn't hold an object array. It can't. It holds a map to an object array: It's a reference.

So, what happens when you make a new arraylist? It is a treasure chest, fixed size, with room for exactly 2 things:

  1. A map to an 'object array' treasure chest (which it will also make, with room for 10 maps, and buries it in the sand, and stores the map to this chest-of-maps inside itself.
  2. A coinpouch. The coin inside represents how many objects the list actually contains. The map to the treasure it has leads to a treasure with room for 10 maps, but this coin (value: 0) says that so far, none of those maps go anywhere.

If you then run list.add("foo"), what that does is complicated:

  1. "foo" is an object (i.e. treasure), so "foo" as an expression resolves to be a map to "foo". It then takes your list treasuremap, follows it, digs down, opens the box, and you yell 'oi! ADD THIS!', handing it a copy of your treasuremap to the "foo" treasure. What the box then does with this is opaque to you - that's the point of OO.
  2. But let's dig into the sources of arraylist: What it will do, is query its treasuremap to the object array (which is private, you can't get to it, it's in a hidden compartment that only the djinn that lives in the treasure chest can open), follows it, digs down, and goes to the first slot (why? Because the 'size coin' in the coinpouch is currently at 0). It takes the map-to-nowhere that is there, tosses it out, makes a copy of your map to the "foo" treasure, and puts the copy in there. It then replaces its coin in the coin pouch with a penny, to indicate it is now size 1.
  3. If you add an 11th element, the ArrayList djinn goes out to the other treasure, notices there is no room, and goes: Well, dang. Okay. It then conjures up an entirely new treasure chest that can hold 15 treasure maps, it copies over the 10 maps in the old treasure, moves them to the new treasurechest, adds the copy of the map of the thing you added as 11th, then goes back to its own chest, rips out the map to the real treasure and replaces it to a map of the newly made treasure (With 15 slots), and puts an 11ct coin in the pouch.
  4. The old treasure chest remains exactly where it is. If nobody has any maps to this (and nobody does), eventually, the beachcomber finds it, gets rid of it (that'd be the garbage collector).

Thus, ALL treasure chests are fixed size, but by replacing maps with new maps and conjuring up new treasure chests, you can nevertheless make it look like ArrayList is capable of shrinking and growing.

So why don't arrays allow it? Because that shrinking and growing stuff is complicated and arrays expose low-level functionality. Don't use arrays, use Lists.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
1

You seem to misunderstand what "storage" means. You say "there's no limit on the amount of data I can store", but if you run myInteger.myValue = 15, you overwrite the value of 32 that you put there originally. You still can't store any more than 32 bits, it's simply that you can change which 32 bits you put in that variable.

If you want to see how ArrayList works, you can read the source code; it can expand because if it runs out of space it creates a new larger array and switches its single array variable elementData to it.

Based on your update, it seems like you may be wondering about the ability to add lots of different fields to your object definition. In this case, those fields and their types are fixed when the class is compiled, and from that point on the class has a fixed size. You can't just pile in extra properties at runtime like you can in JavaScript. You are telling it up front about the scale it needs.

chrylis -cautiouslyoptimistic-
  • 75,269
  • 21
  • 115
  • 152
  • Yes, I'm aware. Sorry, I think my question was a bit confusing (you're the third person to misunderstand, and I assume that's my fault, not yours). What I'm actually trying to say is that I can store any number of variables/methods inside `myInteger` without manually allocating data to them, but if I want to store that data in an array then I have to tell it how much data there will be beforehand. I know that overwriting a variable doesn't make it take up more space. – OOPS Studio Jan 03 '21 at 04:31
  • @OOPSStudio No, you can't. If you're adding extra fields to your class, you have to do that _at compile-time_, and each class has a fixed data size as well as a fixed type for each value. Arrays are the mechanism by which the JVM handles the concept of "multiple of _any something_". – chrylis -cautiouslyoptimistic- Jan 03 '21 at 04:45
  • Thank you! That answers my question. I marked your answer as Accepted and it tells me what I need to know. I appreciate your time and I'll go do some more research keeping in mind what you shared. I appreciate it. – OOPS Studio Jan 03 '21 at 04:49
1

I'm going to ignore most of the details you've given, and answer the question in your edit.

My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront?

It's worth starting by dealing with "when I make any other object it scales on its own", because this isn't true. If you create a class like this:

class MyInteger
  public int value;
  public MyInteger(int value) {
    this.value = value;
  }
}

Then that class has a statically defined size. Once you've compiled this class, the amount of memory for an instance of MyInteger is already determined. In this case, it's the object header size (JVM dependent), and the size of an integer (at least 4 bytes).

Once an object has been allocated by the JVM, its size cannot change. It is treated as a block of bytes by the JVM (and importantly, the garbage collector) until it is reclaimed. Classes like ArrayList give the illusion of growing, but they actually work by allocating other objects, which they store references to.

class MyArrayList {
  public int[] values;
  public MyArrayList(int[] values) {
    this.values = values;
  }
}

In this case, the MyArrayList instance will always take the same amount of memory (object header size + reference size), but the array that is referenced may change. We could do something like this:

MyArrayList list = new MyArrayList(new int[50]);

This allocates a block of memory for list, and a block of memory for list.values. If we then do (as ArrayList effectively does internally):

list.values = new int[500];

then the memory allocated for list is still the same size, but we have allocated a new block which we then reference in list.values. This leaves our old int[50] with no references (so it can be garbage collected). Importantly, though, no allocation has changed size. We have reallocated a new, bigger, block for our list to use, and have referenced it from our MyArrayList instance.

mange
  • 3,172
  • 18
  • 27
1

Why do arrays in Java need to have a pre-defined length when Objects don't?

In order to understand this, we need to establish that "size" is a complicated concept in Java. There are a variety of meanings:

  • Each object is stored in the heap as one or more heap nodes, where one of these is the primary node, and the rest are component objects that can be reached from the primary node.

    The primary heap node is represented by a fixed and unchanging number of bytes of heap memory. I will call this1 the native size of the object.

  • An array has an explicit length field. This field is not declared. It has a type int and cannot be assigned to. There is actually a 32 bit field in the header of each array instance that holds the length.

    The length of an array directly maps to its native size. The JVM can compute the native size from the length.

  • An object that is not an array instance also has a native size. This is determined by the number and types of the object's fields. Since fields cannot be added or removed at runtime, the native size does not change. But it doesn't need to be stored since it can be determined (when needed) at runtime from the object's class.

  • Some objects support a class specific size concept. For example, a String has a size returned by its length() method, and an ArrayList has a size returned by its size() method.

    NB:

    1. The meaning of the class specific size is ... class specific.

    2. The class specific size does not correlate to the native size of an instance. (Except in degenerate cases ...)

In fact, all objects have a fixed native size.

1 - This term is solely for the purposes of this answer. I claim no authority for this term ...


Examples:

  1. A String[] has a native size that depends on its length. On a typical JVM it will be 12 + length * (<reference size>) rounded up to a multiple of 16 bytes.

  2. Your Integer class has a fixed native size. On a typical JVM each instance will be 16 bytes long.

  3. An ArrayList object has 2 private int fields and a private Object[] field. That gives it a fixed native size of either 16 or 24 bytes. One of the int fields is call size, and it contains the value returned by size().

    The size of an ArrayList may change, but this is implemented by the code of the class. In order to do this, it may need to reallocate its internal Object[] to make it large enough to hold more elements. If you examine the source code for the ArrayList class, you can see how this happens. (Look for the ensureCapacity and grow methods.)


So, the differences between the size(s) of regular object and the length of an array are:

  • The natural size of a regular object is determined solely by the type of the object, it never changes. It is rarely relevant to the application and it is not exposed via a field.

  • The length of an array depends on value supplied when you instantiate it. It never changes. The natural size can be determined from the length.

  • The class specific size of an object (if relevant) is managed by the class.


To your revised question:

My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront? (With ArrayLists being an example of the difference)

The point is that at the JVM level, NOTHING scales automatically. The native size of a Java object CANNOT change.

Why? Because increasing the size of the object's heap node would entail moving the heap node, and a heap node cannot be moved without updating all references for the object. That cannot be done efficiently.

(It has been pointed out that the GC can efficiently move heap nodes. However, that is not a viable solution. Running the GC is expensive. It would be highly inefficient to perform a GC in order to (say) grow a single Java array. If Java had been specified so that arrays could "grow", it would need to implemented using an underlying non-growable array type.)

The ArrayList case is being handled by the ArrayList class itself, and it does it by (if necessary) creating a new, larger backing array, copying the elements from the old to the new, and then discarding the old backing array. It also adjusts the size field that hold the logical size of the list.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • "... a heap node cannot be moved without updating all references for the object. That cannot be done efficiently." I'm not so sure about this. Copying collectors have to do exactly this, and they seem to be efficient enough. Both G1GC and Shenandoah perform copying, from what I understand. Maybe you could clarify what you mean when you say "efficiently"? – mange Jan 03 '21 at 05:05
  • I will clarify that. What I mean is that you cannot do it efficiently unless you run the GC, and it is highly inefficient to perform a GC just so that you can grow a single heap node. Hence, you cannot do it efficiently. – Stephen C Jan 03 '21 at 05:07
  • Why can't I do it without running the GC? If I were writing a JVM, what would stop me from writing a low-level "move object" operation, and exposing it in something like `sun.misc.Unsafe`? I don't think this is fundamentally impossible, it's just that it's not in the JVM specification. – mange Jan 03 '21 at 05:14
  • Yes, you can't. It is fundamentally impossible. Because you can't find and update **all** of the references to the object that has been moved. Unless you make all arrays indirection cells. (Actually, that has been tried, but it is not efficient, and it doesn't scale, and it is likely to be a concurrency bottleneck.) But this discussion is going too far off-topic. – Stephen C Jan 03 '21 at 05:22
  • Shenandoah rewrites references, while having pause times that are independent of the heap size. This shows that it is at least possible to rewrite references efficiently. Shenandoah uses forwarding pointers, and updates things during the next GC (which does not need to happen immediately). – mange Jan 03 '21 at 05:34
  • No it doesn't. Independent of heap size doesn't mean fast. Even if the pause time is consistently less than 1 millisecond, that is unacceptable if you potentially have to do this on every array write. And the GC does need to happen immediately. The thread doing the write is blocked from doing anything until the array grow has completed. – Stephen C Jan 03 '21 at 05:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226762/discussion-between-mange-and-stephen-c). – mange Jan 03 '21 at 05:39
0

Object arrays allocate space for object pointers, and not entire objects in memory.

So new String[10] doesnt allocate space for 10 strings, but for 10 object references that would be point to what strings are stored in the array.

Martin'sRun
  • 522
  • 3
  • 11
  • Yes, I understand that. I'm not sure if I worded it wrong or if you read it wrong (it's probably my fault) but I understand that I'm allocating memory for pointers... My question is why does a pointer to an array need to know how big the array is beforehand, when a pointer to any other object doesn't? – OOPS Studio Jan 03 '21 at 04:33