2

So, I've studied programming languages using the book of Robert W. Sebesta, "Concepts of Programming Languages". There is one interesting paragraph that compares heterogeneous array to record. Apparently, the access time of an array is much slower because subscripts are dynamic, in contrast to static field names.

Question: So why is it faster? Does it have something to do with the utilization of Stack and Heap memory?

page 265, chapter 6:

Records are used when the collection of data values is heterogeneous and the different fields are not processed in the same way. Also, the fields of a record often need not be processed in a particular order. Field names are like literal, or constant, subscripts. Because they are static, they provide very efficient access to the fields. Dynamic subscripts could be used to access record fields, but it would disallow type checking and would also be slower.

Sakhund
  • 228
  • 1
  • 5
  • 32
  • Can you please quote the paragraph you're referring to? Currently the question severely lacks context. Is it related to [static vs dynamic dispatch](https://en.wikipedia.org/wiki/Static_dispatch)? – Aivean Nov 09 '21 at 23:15
  • 2
    @Aivean I edited it, you should see the paragraph I was referring to. – Sakhund Nov 09 '21 at 23:19
  • 1
    It says 'Because they are static ...'. What does it mean to be static? – Sakhund Nov 09 '21 at 23:21
  • 2
    "static" means "known in compile time". I.e., if you have a "record" `point` with fields `x` and `y`. When you access it as `point.y` compiler knows the address shift of `y` related to the memory address of `point`. If you don't know what field you access, i.e. you have a string variable `field="y"` and then call `point[field]` (Javascript style), compiler doesn't know which field you are accessing and has to add runtime look up for the memory shift in the field table for your object (or "record"). Note, this is greatly simplified. – Aivean Nov 09 '21 at 23:35
  • @Aivean great explanation, man. Understood. Especially thank you for simplifying, that helped. – Sakhund Nov 09 '21 at 23:39
  • 2
    Also, dynamic subscripts must be checked to insure that they are in-bounds. – RBarryYoung Nov 10 '21 at 00:01
  • 1
    Can yo clarify what you mean by "heterogeneous array"? Also why subscripts are assumed to be dynamic? Indeed, if you know the location of a field in the array, then the indexing is a static one. Can you tell us more about the context (compiled vs interpreted code, kind of target programming language, etc.). Low-levels details often matter a lot when it comes to performance. – Jérôme Richard Nov 10 '21 at 00:36
  • @Jérôme Richard, I'm not looking for a specific language, does not matter interpreted or compiled. Just general definition. – Sakhund Nov 10 '21 at 00:45
  • The book defines heterogeneous arrays this way: "Python’s arrays are called lists, although they have all the characteristics of dynamic arrays. Because the objects can be of any type, these arrays are heterogeneous". – Sakhund Nov 10 '21 at 00:46

1 Answers1

3

First I'd like to mention that the concept of "performance" doesn't really exist detached to from the actual hardware and low level implementation. When this book tells that a certain way of data organization is "faster" it makes a number of assumptions regarding the actual compiler/interpreter implementation and the hardware (von Neumann architecture, CPU instruction set, memory layout).

With that out of the way, let's talk about the possible differences in the implementations of the static record, homogeneous and heterogeneous array.


1. Static record

struct Point {
    int x;
    int y;
};

This is a simple C++ struct. When you do:

Point *p = new Point();

memory allocator allocates 8 bytes on the heap and stores the address of the start of this block in p.

Compiler knows the size of Point in advance (8 bytes), for each field of the Point compiler knows its size and shift (i.e. y has size 4 and shift 4).

When you access a field of the Point (p -> x), compiler replaces it with calculating the actual memory address of the field x (address of p + shift of x) and accessing this address. No runtime overhead whatsoever.


2. Homogeneous array

int *arr = new int[10];

This is a simple homogeneous array in C++. Similar to the struct, allocator allocates 4 * 10 bytes on the heap and stores its start address in arr.

When you access element of the array, arr[i], compiler knows the address of the arr, size of its elements (they are the same, as array is homogeneous), so it can calculate the memory address you're accessing as arr + i * element_size (a single CPU instruction on most architectures).

Again, not much overhead, if you're not counting accessing i in addition to the arr.


3. Heterogeneous array

Now, that is where things get interesting. There is no direct low-level representation for heterogeneous array. There are multiple possible implementations that map this high-level concept into the actual memory and machine code.

Let's consider one of them.

enum ElType {
    INT, POINT
};

struct ArrEl {
    ElType type;
    char* elPtr;
};

ArrEl *arr = new ArrEl[10];

arr[1].type = POINT;
arr[1].elPtr = (char*)(new Point {3,4} );

arr[2].type = INT;
arr[2].elPtr = (char*)(new int {2} );

Notice, since the array can now store elements of any type (for this demo only int or Point):

  1. Array elements can have different size (Point is 8 bytes, int is 4), so actual elements have to be allocated on the heap, and array stores only pointers (Alternative approach would allocate the memory equal to the size of the largest possible element for each element, and it is too wasteful in general case; note: see comments below).
  2. To know the actual type of the stored element, additional metadata has to be stored. This approach choses to store it in the array, but most actual implementations (including python and java) store it together with the actual object, as an "object header". See the simplified implementation here.

Now to access an element of such array, this metadata has to be checked:

ArrEl el = arr[2];

if (el.type == INT) {
  int *value = (int *) el.elPtr;
  std::cout <<  *value;
} else if (el.type == POINT) {
  Point *p = (Point *) el.elPtr;
  std::cout <<  p->x;
}

The overhead of accessing heterogeneous array consists of the additional indirection: first you have to access the element of the array, and then follow the pointer to the actual value on the heap, and check the metadata.

There is also an additional memory overhead of storing all the pointers and metadata. This is especially noticeable, when you compare homogeneous arrays of "simple" types, such as int to heterogeneous arrays that can store, say int and float (note: see comments below).


Ok, now, when I've given the general idea why heterogeneous arrays can be slow in theory, let's talk about the real world.

Most modern VMs for languages that have heterogeneous arrays have JIT. Opposite to the static compilers (like C++), JIT can make some optimistic assumptions about the executed code and, if such assumptions fail, recompile code to the more pessimistic variant in runtime.

Back to arrays, although arrays in dynamic languages, such as Javascript and Python, are heterogeneous, it's possible that when array is used in a homogeneous way, JIT compiles it to be homogeneous internally! For example, V8 certainly does that. I don't think Python does that at the moment, but it might in the future.

Also, modern optimizing compilers, including static compilers, can rewrite your code in ways you wouldn't expect. For example, your code creates an object and performs some operations on its fields, but in reality compiler will replace field access with CPU registers. No "actual" object is created at all.

That is why it's always important to verify theoretical assumptions about performance with actual benchmarks.


P.S. all C++ code in this demo is not idiomatic, unsafe and bad. Don't use it at home.

Aivean
  • 10,692
  • 25
  • 39
  • 2
    Note that values can be stored in the "heterogeneous array" without using an additional indirection nor a dynamic allocation. One can use algebraic types to do that. In C, unions can be used to do that. This assume that the list of type is fixed at compile-time (which is often the case). With that, the main cost is the additional condition needed because of the (semi-)dynamic typing. – Jérôme Richard Nov 10 '21 at 17:19
  • 1
    @JérômeRichard, you are right. I briefly mentioned that approach as *"allocate the memory equal to the size of the largest possible element for each element"*, but discarded it as not applicable in general case. But you are correct, if used for only small types of roughly the same size it will have the lowest overhead. – Aivean Nov 10 '21 at 17:55
  • 2
    I partially agree for the size. Note that many allocators introduce a memory overhead for each allocated object (eg. typically 8 bytes and add some padding between objects in order to align object typically on 16 bytes). On my machine, the pointer based implementation with just 4-bytes integers takes 24 bytes/integers. Note you can allocate data only for big structures so that the union type is still relatively small. Allocations are generally expensive and often cause memory diffusion/fragmentation. Some JIT like V8 do this optimization: object data/pointers are stored in a `double` value. – Jérôme Richard Nov 10 '21 at 19:44
  • 2
    @JérômeRichard, I haven't thought of this hybrid approach. Nice to know, thank you for bringing it up. – Aivean Nov 10 '21 at 19:47