7

I am trying to wrap my head around what the raw memory looks like in different languages when using an array.

Consider the following Java code:

String a = "hi";
String b = "there";
String c = "everyone";
String[] array = {a, b, c};

Obviously the array is holding references, and not objects; that is, there is a contiguous array in memory of three references which each points to some other location in memory where the object sits. So the objects themselves aren't necessarily sitting in three contiguous buckets; rather the references are.

Now consider this:

String[] array = {"hi", "there", "everyone"}

I'd imagine in this situation the Strings exist somewhere with all the other constants in memory, and then the array holds references to those constants in memory? So, again, in raw memory the array doesn't look like ['h', 'i', '\0', 't', 'h', 'e', 'r', 'e'... (etc)]. (using c-style termination just for convenience). Rather, it's more like ['a83a3edf' ,'a38decd' ... (etc)] where each element is a memory location (reference).

My conclusion from this thought process is that in Java, you can never ever imagine arrays as buckets of contiguous objects in memory, but rather as contiguous references. I can't think of any way to guarantee objects will always be stored contiguously in Java.

Now consider C:

char *a = "hi";
char *b = "there";
char *c = "everyone";
char *array[] = {a, b, c};

The code above is functionally equivalent to the Java above -- that is, the array holds references (pointers) to some other memory location. Like Java, the objects being pointed to aren't necessarily contiguous.

HOWEVER, in the following C code:

struct my_struct array[5];  // allocates 5 * size(my_struct) in memory! NOT room for 5
                            // references/pointers, but room for 5 my_structs.

The structs in array ARE contiguously located in raw memory.

Now for my concrete questions:

  1. Was I correct in my assumption that in Java, arrays must ALWAYS hold references, as the programmer only ever has access to references in Java? What about for raw data types? Will it work differently then? Will an array of ints in Java look just like one in C in raw memory (besides the Object class cruft Java will add)?

  2. In Java, is there no way for the programmer to guarantee contiguous memory allocation of objects? It might happen by chance, or with high probability, but the programmer can not GUARANTEE it will be so?

  3. In C, programmers CAN create raw arrays of objects (structs) contiguously in memory, as I have shown above, correct?

  4. How do other languages deal with this? I'm guessing Python works like Java?

The motivation for this question is that I want a solid understanding of what is happening at the raw memory level with arrays in these languages. Mostly for programmer-interview questions. I said in a previous interview that an array (not in any language, just in general) holds objects contiguously in memory like buckets. It was only after I said this that I realized that's not quite how it works in a language like Java. So I want to be 100% clear on it.

Thanks. Let me know if anything needs clarification.

Mike
  • 8,055
  • 1
  • 30
  • 44
bob
  • 1,879
  • 2
  • 15
  • 27
  • What about an array of e.g. give *integers*? Do you think that would be stored as five references in Java as well? Besides that, the simple answer to this question is that you can't simply compare languages like this, different languages does different things and have different semantics even for things that seems the same. – Some programmer dude Sep 03 '15 at 17:19
  • C does not have references. You are comparing apples with oranges. – too honest for this site Sep 03 '15 at 17:22
  • By _integer_ do you mean `int` or `Integer`? The former is a scalar, the latter is an object wrapper that would, in fact, be stored as references to objects on the heap. (eg `Integer[]` vs `int[]`). In Python even integers are references to objects. – dsh Sep 03 '15 at 17:24
  • @Olaf, pointers in C are basically references in Java, so it's hardly apples to oranges. They might operate in different ways and expose different amounts of control to the user, but an array in Java (of non-primitive types) is essentially equivalent to an array of pointers in C, no apples or oranges invovled whatsoever. That is, they both truly store pointers to other memory locations where the referenced objects reside. – bob Sep 04 '15 at 05:26
  • No, they are not. A reference has not a value of its own, this is always the object it points to, a pointer does (first class object). That is **exactly** your missconception and that was what I tried to hint! The contents of the C array is **always** what you specify: A pointer (to whatever), an `int`, a `struct`. – too honest for this site Sep 04 '15 at 10:34

4 Answers4

7

you can never ever imagine arrays as buckets of contiguous objects in memory, but rather as contiguous references.

In theory you are right, in practice, the JVM doesn't randomise memory access. It allocates memory sequentially and it copies objects during a GC in order of discovery (or reverse order)

Was I correct in my assumption that in Java, arrays must ALWAYS hold references, as the programmer only ever has access to references in Java?

Yes, Unless you have an array of primitives of course.

What about for raw data types? Will it work differently then?

Primitives and References are continuous in memory. They are basically the same.

Will an array of ints in Java look just like one in C in raw memory (besides the Object class cruft Java will add)?

yes.

In Java, is there no way for the programmer to guarantee contiguous memory allocation of objects?

Not unless you use off heap memory. Though generally this isn't as much of a problem as you might think as most of the time, the objects will be continuous in memory.

It might happen by chance, or with high probability, but the programmer can not GUARANTEE it will be so?

correct. Usually you have bigger problems when you look at the worst 0.1% latencies or above.

In C, programmers CAN create raw arrays of objects (structs) contiguously in memory, as I have shown above, correct?

yes. You can do it in Java as well, but you have to use off heap memory. There is a number of libraries which support this such as Javolution, Chronicle, SBE.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
0

Low-level languages like C make you deal with memory layout, and whether you have a pointer to somewhere else or a value right here. Make sure you handle stack vs heap allocation correctly and don't forget to free() every pointer you malloc().

Higher level languages like Java, Python, and JavaScript take away that low-level layout of memory. All objects are on the heap and you have a reference to it. While the reference is similar to a pointer, it is opaque and not directly associated with a given memory location. As such, all data structures contain references to objects.

dsh
  • 12,037
  • 3
  • 33
  • 51
0

to 1) In java arrays are Objects and objects and arrays are stored on the heap, since the heap might not be continuous, so arrays also might not be continuous.

4) In python you can create a contiguous array, if you use scipy

arash javanmard
  • 1,362
  • 2
  • 17
  • 37
0

I can't speak in any detail to Java, although my understanding is that given the following code

int arr[] = new int[N];

the local (stack) variable arr contains a reference to an array object on the heap, giving us a layout something like this:

          +---+
     arr: |   |---+
          +---+   |
           ...    |
          +---+   |
      cp: |   |<--+  class pointer 
          +---+ 
     flg: |   |      flags
          +---+
     lck: |   |      locks
          +---+
      sz: |   |      size
          +---+
  arr[0]: |   |
          +---+
  arr[1]: |   |
          +---+
           ...
          +---+
arr[N-1]: |   |
          +---+

For an array of primitive types, the values are stored directly in arr[0], arr[1], etc. For an array of class types, each element of the array stores a reference to an instance of that class, so there's another level of indirection. The references themselves are stored contiguously, but the instances that they point to are not (or at least, aren't guaranteed to be).

C and C++ arrays are a lot less complicated. Given the following code:

 int arr[N];

you get the following:

          +---+
  arr[0]: |   |
          +---+ 
  arr[1]: |   |
          +---+ 
           ...
          +---+
arr[N-1]: |   |
          +---+

There's no indirection or metadata involved with a C array. There's no storage set aside for an object arr to point to the first element of the array. If the array has auto extent (meaning it was declared within a block and not static), then the memory for the array elements is allocated the same as for any local variable.

For any type T, T arr[N] will set aside N contiguous elements to store values of type T. If T is an obnoxious struct type, then T a[N] stores N contiguous instances of that obnoxious struct type.

John Bode
  • 119,563
  • 19
  • 122
  • 198