-1

I need to implemenet a simple dynamic array that can work with any type.

Right now my void** implementation is ~50% slower than using int* directly:

#define N 1000000000

// Takes ~6 seconds
void** a = malloc(sizeof(void*) * N);
for (int i =0; i < N; i++) {
        *(int*)(&a[i]) = i;

}
printf("%d\n", *(int*)&a[N-1]);

// Takes ~3 seconds
int* b = malloc(sizeof(int) * N) ;
for (int i =0; i < N; i++) {
        b[i] = i;
}
printf("%d\n", b[N-1]);

I'm not a C expert. Is there a better way to do this?

Thanks

edit

Look like using void** is a bad idea. Is there a way to implement this with void*?

Here's how it's implemented in Go:

type slice struct {
        array unsafe.Pointer
        len   int
        cap   int
}

I'd like to do something similar.

edit2

I managed to implement this with void*.

The solution was really simple:

 void* a = malloc(sizeof(int) * N);
 for (int i = 0; i < N; i++) {
    ((int*)a)[i] = i;
 }
 printf("%d\n", ((int*)a)[N-1]);

Performance is the same now.

Alex
  • 34,581
  • 26
  • 91
  • 135
  • your compiler probably has a hard time optimizing the first version, because your pointer types are inconsistent; there is nowhere a pointer to a pointer, yet you declare `void** a`. – Ctx Oct 20 '18 at 15:02
  • What's the purpose of the `void **a` and all that trickery? What problem is that supposed to solve? – Some programmer dude Oct 20 '18 at 15:04
  • It looks like the `void** a` version would cause undefined behavior. You have a pointer to pointer type, you initialized the first array of points, but then the pointers that are pointed to are not initialized and you write to them. Basically, you're writing to an array full of garbage pointers. – Kamil Jarosz Oct 20 '18 at 15:05
  • 2
    Probably, `sizeof(void*)` is larger than `sizeof(int)` Hint: you can print the sizes! – wildplasser Oct 20 '18 at 15:05
  • @Someprogrammerdude I'm trying to implement a dynamic array that works with any type. Like vector in C++. – Alex Oct 20 '18 at 15:05
  • I changed it to `sizeof(void*)`. Same result. – Alex Oct 20 '18 at 15:06
  • @Alex in this case, you have to allocate memory for the array members and save the pointers to the array members `a[0]`. Something like `a[n] = malloc(sizeof someobject); *(a[n]) = someobject;` (cast still has to be added of course) – Ctx Oct 20 '18 at 15:09
  • @Ctx here no allocation is used for every single element: https://stackoverflow.com/questions/21019406/dynamic-array-of-void-pointers – Alex Oct 20 '18 at 15:19
  • @Alex Well, it depends if you want to store an existing object or a _copy_ of that object. It might be necessary to store copies if the lifetime of the existing object is too short or unclear – Ctx Oct 20 '18 at 15:21
  • I really don't want to be calling malloc N times... Is there a way to implement this with void*? I'm not sure how to assign arr[n] = 7 then, since you cant' assign to void*. – Alex Oct 20 '18 at 15:25
  • "*I really don't want to be calling malloc N times ...*" you do not need to. ;) – alk Oct 20 '18 at 15:35
  • 1
    See this [answer on minimizing the number of allocation](https://stackoverflow.com/a/52678788/694576). – alk Oct 20 '18 at 15:37

3 Answers3

3

Your two alternatives programs are not analogous. In the second one, which is valid, you allocate space sufficient to hold N integers, and then assign values to the int-size members of that space. In the first one, however, you allocate space large enough to accommodate N pointers to void and then, without initializing those pointers, you try to assign values to the objects to which they point. Even if those pointers had been initialized to point to int objects, there is an extra level of indirection.

Your first code could be corrected, in a sense, like so:

void** a = malloc(sizeof(void*) * N);
for (int i =0; i < N; i++) {
        a[i] = (void *) i;

}
printf("%d\n", (int) a[N-1]);

That relies on the fact that C allows conversions between pointer and integer types (although not necessarily without data loss), and note that there is only a single level of indirection (array indexing), not two.

Inasmuch as the behavior of your implementation of the first alternative is undefined, we can only speculate about why it runs slower in practice. If we assume a straightforward implementation, however, then such a performance penalty as you observe might arise from poor cache locality for all the array writes.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • "*C allows conversions between pointer and integer types*" yes, but only pointer->integer->pointer, and not integer->pointer->integer. Which is not the same. – alk Oct 20 '18 at 15:39
  • 1
    @alk, C allows integer->pointer and pointer->integer. These can be composed in any combination. I think you mean that integer->pointer->integer is not guaranteed to be value-preserving, which is the same thing I intend to convey by "although not necessarily without data loss". In practice, however, `int` -> `void *` -> `int` is *likely* to be value-preserving, and that is implementation-defined, so one can acquire justified confidence about whether one can rely on it with any given implementation. – John Bollinger Oct 20 '18 at 15:46
  • @JohnBollinger now I'm getting `cast to 'void *' from smaller integer type 'int'`. Besides, I don't want to only store ints. – Alex Oct 20 '18 at 15:50
  • @Alex, that would be a warning, not an error. You could silence it by casting first to `intptr_t` and then casting the result to `int`, though it's possible that your compiler would just substitute a different warning. As for being specific to `int`s, the alternative I presented is neither more nor less `int`-specific than your original one. You have to have appropriate type-specific conversions somewhere for this to work. – John Bollinger Oct 20 '18 at 15:57
  • C11 6.3.2.3/6: "*Any pointer type may be converted to an integer type. [...] If the result cannot be represented in the integer type, the behavior is undefined.*" Assuming 64 bit for `void*` and 32 bit for an `int` we have UB when assigning a `void*` to an `int`, haven't we? – alk Oct 20 '18 at 16:00
  • @alk "if the result cannot be represented in the integer type" is not the same thing as "if there exist any pointer values that cannot be represented in the integer type". There is a possibility of UB here, but not a necessity, nor, IMO, a likelihood. And it is overall implementation-defined. – John Bollinger Oct 20 '18 at 19:20
0

Be aware that sizeof(void *) is the double of sizeof(int) on 64 bits processors (8 bytes address versus 4 bytes signed integer). If that's your case, I bet the difference only is page cache miss. You memory unit is required to load two times more pages, which is slow (link for more information here).

Please also note that C++ vectors aren't "dynamic array that can work with any type". They are bound to a type, for instance: std::vector<int> is a dynamic array but where you can only store int.

A solution to your problem would be to implement some sort of std::vector<void *> in C. But it's not efficient:.

  • You need to do 2 allocations for each element (1 for the container and 1 for the element itself)
  • You need to do 2 levels of indirection each time you access the data (1 to get the pointer in the container and 1 to get the data in the element)
  • You need to store some kind of type information in each element. If not, you don't know what is in your dynamic array
  • C++ refers to the OP's comment clarifying the question: *I'm trying to implement a dynamic array that works with any type. Like vector in C++.* – user4815162342 Oct 20 '18 at 16:46
  • Yes, C++ vector is a bad example, because it's just templates. I could do the same with macros or codegen, but I don't want to have N * nr_types functions. I wanted to implement it the way Go slices work. I managed to do this (see my answer). – Alex Oct 20 '18 at 16:50
0

I managed to implement this with void*.

The solution was really simple:

 void* a = malloc(sizeof(int) * N);
 for (int i =0;i<N;i++) {
  ((int*)a)[i] = i;
 }
 printf("%d\n", ((int*)a)[N-1]);

Performance is the same now.

I also came across this great article that explains how to implement a generic data structure in C:

http://jiten-thakkar.com/posts/writing-generic-stack-in-c

Alex
  • 34,581
  • 26
  • 91
  • 135