Trick to avoid needing to initialize an array

Question

Normally if I want to allocate a zero initialized array I would do something like this:

int size = 1000;
int* i = (int*)calloc(sizeof int, size));

And later my code can do this to check if an element in the array has been initialized:

if(!i[10]) {
  // i[10] has not been initialized
}

However in this case I don't want to pay the upfront cost of zero initializing the array because the array may be quite large (i.e. gigs). But in this case I can afford to use as much memory as I want memory.

I think I remember that there is a technique to keep track of the elements in the array that have been initialed, without paying any up front cost, that also allows O(1) cost (not amortized with a hash table). My recollection is that the technique requires an extra array of the same size.

I think it was something like this:

int size = 1000;
int* i = (int*)malloc(size*sizeof int));
int* i_markers = (int*)malloc(size*sizeof int));

If an entry in the array is used it is recorded like this:

i_markers[10] = &i[10];

And then it's use can be checked later like this:

if(i_markers[10] != &i[10]) {
  // i[10] has not been initialized
}

Of course this isn't quite right because i_markers[10] could have been randomly set to &i[10].

Can anyone out there remind me of the technique?

Thank you!

I think I remembered it. Is this right? Is there a better way or are there variations on this? Thanks again. (This was updated to be the right answer)

struct lazy_array {
    int size;
    int* values;
    int* used;
    int* back_references;
    int num_used;
};

struct lazy_array* create_lazy_array(int size) {
    struct lazy_array* lazy = (struct lazy_array*)malloc(sizeof(lazy_array));
    lazy->size = 1000;
    lazy->values = (int*)malloc(size*sizeof int));
    lazy->used = (int*)malloc(size*sizeof int));
    lazy->back_references = (int*)malloc(size*sizeof int));
    lazy->num_used = 0;
    return lazy;
}

void use_index(struct lazy_array* lazy, int index, int value) {
    lazy->values[index] = value;
    if(is_index_used(lazy, index))
      return;
    lazy->used[index] = lazy->used;
    lazy->back_references[lazy->used[index]] = index;
    ++lazy->used;
}

int is_index_used(struct lazy_array* lazy, int index) {
    return lazy->used[index] < lazy->num_used &&
        lazy->back_references[lazy->used[index]] == index);
}

Do not cast [result of malloc](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). — Ilya, Oct 02 '15 at 04:24
how can you say that your trying to minimize execution time (or any other resource) and then propose twice the heap resource and extra code for setting/checking the contents of the second array. — user3629249, Oct 02 '15 at 04:28
[Interesting reading from another question about calloc](https://stackoverflow.com/a/18480292/3386109). I suggest reading the entire thread, especially the comments under the top-voted answers. My impression is that using `calloc` on a modern operating system should outperform the lazy array. — user3386109, Oct 02 '15 at 04:34
if the array is declared in the file global address space, then it will be set to all 0x00 as the process is loaded and goes through the process initialization — user3629249, Oct 02 '15 at 04:39
@Ilya The code casts the result of malloc for C++ compatibility. — John Cashew, Oct 02 '15 at 04:53
@user3629249 in this case I want to use more memory to use less up front CPU. That is what is desirable in this situation. — John Cashew, Oct 02 '15 at 04:54
@user3386109 That referenced article is not the whole story and it is quite misleading. calloc often returns memory from the heap rather than an anonymous segment (just like with malloc this depends mostly on the size of the allocation, the malloc options, the malloc library, the malloc tunable parameters and the OS's overcommit policy.) Even without those factors a lazy array is still a fit for a different use case than calloc. — John Cashew, Oct 02 '15 at 04:58
In some operating systems (e.g. Linux) a large `calloc` is equal or faster than a large `malloc`, because the OS supplies virtual pages already zeroed anyway. So put away the premature optimization unless you have demonstrated the need through testing! — M.M, Oct 02 '15 at 05:18
@M.M The capability and performance goals of the system dictate what data-structures are required. Testing does little to help you decided between an O(1) and O(N) algorithm. — John Cashew, Oct 02 '15 at 12:41
@JohnCashew, The code is not C++, if it were then malloc/calloc would not be used. So there is no need to incorporate any C++ compatibility. That C++ compatibility is just cluttering the code and will increase the headaches when performing maintenance. Also, the file global space is always set to 0x00 while the process is being loaded, so it does not take up any extra CPU cycles. — user3629249, Oct 02 '15 at 18:24
@user3629249 This code is C++. malloc/calloc are sometimes appropriate to use in C++. Also this is a general purpose variable sized data structure. Use of the global space would not be an appropriate solution. — John Cashew, Oct 02 '15 at 18:53
`i_markers[10] = &i[10];` is a problem, assigning a pointer to an `int`. Why not use `int** i_markers`? — chux - Reinstate Monica, Oct 02 '15 at 19:24

ShadowRanger · Answer 1 · 2015-10-02T18:33:22.220

3

On most compilers/standard libraries I know of, large calloc requests (and malloc for that matter) are implemented in terms of the OS's bulk memory request logic. On Linux, that means a copy-on-write mmap-ing of the zero page, and on Windows it means VirtualAlloc. In both cases, the OS gives you memory that is already zero, and calloc recognizes this; it only explicitly zeroes the memory if it was doing a small calloc from the small allocation heap. So until you write to any given page in the allocation, it's zero "for free". No need to be explicitly lazy; the allocator is being lazy for you.

For small allocations it does need to memset to clear the memory, but then, it's fairly cheap to memset a few thousand bytes (or tens of thousands) of bytes. For the really large allocations where zeroing would be costly, you're getting OS provided memory that's zero-ed for free (separate from the rest of the heap); e.g. for dlmalloc in typical configuration, allocations beyond 256 KB will always be freshly mmap-ed and munmap-ed, which means you're getting freshly mapped copy-on-write mappings of the zero page (the cost to zero them being deferred until you perform a write somewhere in the page, and paid whether you got the 256 KB via malloc or calloc).

If you want better guarantees about zeroing, or to get free zeroing on smaller allocations (though it's more wasteful the closer to one page you get), you can just explicitly do what malloc/calloc do implicitly and use the OS provided zero-ed memory, e.g. replace:

sometype *x = calloc(num, sizeof(*x)); // Or the similar malloc(num * sizeof(*x));
if (!x) { ... do error handling stuff ... }
...
free(x);

with either:

sometype *x = mmap(NULL, num * sizeof(*x), PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (x == MAP_FAILED) {  ... do error handling stuff ... }
...
munmap(x, num * sizeof(*x));

or on Windows:

sometype *x = VirtualAlloc(NULL, num * sizeof(*x),  MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!x) { ... do error handling stuff ... }
...
VirtualFree(x, 0, MEM_RELEASE); // VirtualFree with MEM_RELEASE only takes size of 0

It gets you the same lazy initialization (though on Windows, this may mean that the pages have simply been lazily zero-ed in the background between requests, so they'd be "real" zeroes when you got them, vs. *NIX where they'd be CoW-ed from the zero page, so the get zero-ed live when you write to them).

edited Oct 02 '15 at 18:33

answered Oct 02 '15 at 04:49

ShadowRanger

143,180
12
188
271

Thank you for your response. However this is not true "In both cases, the OS gives you memory that is already zero". if the memory is not zero, then the OS must set it to zero. – John Cashew Oct 02 '15 at 04:51
In the Linux case, it's using the virtual memory system to copy-on-write mmap the zero page. It's not assigning any RAM to it, it's just setting up page tables to refer to shared zeroes. Those tables reference something that is already zero; it's only when you write to it that a new "real" page is initialized with the zeroes (plus whatever you wrote to it). The mechanism in `VirtualAlloc` may differ, but it's not avoidable in any case; the OS won't hand back uninitialized memory because that presents a security risk; what if some other program had previously stored sensitive data there? – ShadowRanger Oct 02 '15 at 04:54
Either way, you don't have a choice in the matter. Large `malloc`s do the exact same thing as large `calloc`s. If you ask for a lot of memory, it's already zeroed; whether it's "free" or not is irrelevant, because even if it cost something to zero it, you don't have any choice but to pay it; the OS is taking the payment (if there is any; like I said, on Linux, there definitely isn't, and Windows may be background clearing `VirtualFree`-ed memory independent of your program's execution) without your input before you even have a chance to object. – ShadowRanger Oct 02 '15 at 04:56
In the case where the malloc library can allocate from already sbrk()ed or mmap()ed memory malloc does not return zero initiailzed memory, but calloc does. – John Cashew Oct 02 '15 at 05:02
Oh, and confirmation on `VirtualAlloc`: "If you look at the kernel code, the underlying physical memory blocks are zeroed before being made available for use by the kernel memory manager." [Source](http://www.tech-archive.net/Archive/WindowsCE/microsoft.public.windowsce.platbuilder/2008-07/msg00183.html) – ShadowRanger Oct 02 '15 at 05:02
no need to confirm that. any modern OS does that as a security measure. – John Cashew Oct 02 '15 at 05:04
1

I agree `malloc` and `calloc` don't make guarantees (`malloc` on whether some allocations are in fact always zero, `calloc` on whether the zeroing is explicit or implicit), but on at least Linux, Windows and OpenBSD with the standard C runtime libraries, that's how it operates. The threshold for "free" zeroes differs (for `dlmalloc`, the threshold is usually 256 KB, even though pages are usually 4 KB), but it does mean there is a cap on the cost paid. Some details are mentioned on [Wikipedia](https://en.wikipedia.org/wiki/C_dynamic_memory_allocation#Implementations). – ShadowRanger Oct 02 '15 at 05:28
The point is that over the life of a long lived process calloc() is likely to be an O(N) operation and malloc is likey to be an O(1) operation. @ShadowRanger – John Cashew Oct 02 '15 at 12:46
1

No, that's my point. The "largebin" requests (as `dlmalloc` calls them) are independent of the rest of the allocator. When the large request is made (`malloc` or `calloc`), it's `mmap`-ed fresh. When it's `free`-ed, it's `munmap`-ed immediately; even if you ask for the same size block a microsecond later, it's `mmap`-ed fresh (which means lazy CoW mapping the zero page again). It doesn't matter if you use `calloc` or `malloc` when the request is large enough. For smaller requests, yes, `calloc` will need to `memset`, `malloc` won't. But zeroing small requests rarely matters perf-wise. – ShadowRanger Oct 02 '15 at 18:14
whether the allocation is mmap() or sbrk()ed is dependent on many other factors, for example the amount and fragmentation of sbrk()ed memory, the mallopt settings and the malloc library. Regardless I don't want to pay the cost of rezeroing the memory that does not need to be rezeroed even it if copy on write. If I don't want to pay the malloc library $1000 why would I want to pay it $1 100 times? – John Cashew Oct 02 '15 at 18:58
"lazily zero-ed in the background between requests". That is a waste of the CPU. During the time that the OS is zeroing that memory my process would be suspended. – John Cashew Oct 02 '15 at 19:01
Granted, if you're planning to hold the memory rather than freeing it, and you plan to reuse it as "logically" new memory, you could use lazy tricks to avoid anyone zeroing it ever. I'm not sure it will even provide a meaningful performance improvement (premature optimization being the root of all evil, blah, blah, but it's the fun sort of evil!), but that is one scenario where it could help. – ShadowRanger Oct 02 '15 at 19:01
You have finally acknowledged and/or understood the purpose of this data structure! – John Cashew Oct 02 '15 at 19:03
Well, you originally described it as new allocations, not reuse of your own memory, thus the misunderstanding. BTW, I was right about Windows lazily zeroing `VirtualFree`-ed pages in the background, but apparently it *also* lazily backs them; allocations aren't backed until accessed; once accessed they're given pages from the lazily zeroed set of free pages. Details here: https://randomascii.wordpress.com/2014/12/10/hidden-costs-of-memory-allocation/ So yeah, if it runs out of zero-ed pages, you might block on zeroing when you access the page, paying the cost you don't want. – ShadowRanger Oct 02 '15 at 19:09

Vaughn Cato · Answer 2 · 2015-10-02T05:12:15.710

2

This can be done, although it relies on undefined behavior. It is called a lazy array.

The trick is to use a reverse lookup table. Every time you store a value, you store its index in the lazy array:

void store(int value)
{
   if (is_stored(value)) return;
   lazy_array[value] = next_index;
   table[next_index] = value;
   ++next_index;
}

int is_stored(int value)
{
  if (lazy_array[value]<0) return 0;
  if (lazy_array[value]>=next_index) return 0;
  if (table[lazy_array[value]]!=value) return 0;
  return 1;
}

The idea is that if the value has not been stored in the lazy array, then the lazy_array[value] will be garbage. Its value will either be an invalid index or a valid index into your reverse lookup table. If it is an invalid index, then you immediately know nothing has been stored there. If it is a valid index, then you check your table. If you have a match then the value was stored, otherwise it wasn't.

The downside is that reading from uninitialized memory is undefined behavior. Based on my experience, it will probably work, but there are no guarantees.

edited Oct 02 '15 at 05:12

answered Oct 02 '15 at 04:26

Vaughn Cato

63,448
5
82
132

Thank you for your thoughts. Would you take a look at my edited answer? I think that using a third array array eliminates the possibility of using potentially uninitialized memory. – John Cashew Oct 02 '15 at 04:41
@JohnCashew: `lazy->used[index]` might still be uninitialized in `is_index_used()`. – Vaughn Cato Oct 02 '15 at 04:43
Are you sure? I think use_index() guarantees that it is initialized. – John Cashew Oct 02 '15 at 04:49
1

@JohnCashew: ok, I see what you mean, but the code doesn't seem correct. Are you expecting the caller of `use_index()` to pass sequential values for `index` starting at 0? If so, it seems like a regular array would suffice. If not, then the `is_index_used()` check seems wrong, since it is checking the index against `num_used`, when they are unrelated. – Vaughn Cato Oct 02 '15 at 05:01
@vaugncato Thanks! you helped me spot the error in is_index_used(). Please take a look again. It it is fixed now. And there is no need for the indexes to be used in sequential order. – John Cashew Oct 02 '15 at 05:14
1

@JohnCashew: That does look better, but you can see now that if you pass an arbitrary index, then `lazy->used[index]` might be uninitialized. That's not a bug. It is an intentional design trade-off with the lazy array. – Vaughn Cato Oct 02 '15 at 05:17
Yes, it is not a bug. It is intentional. @Vaughn Cato – John Cashew Oct 02 '15 at 12:43

score 1 · Answer 3 · answered Oct 02 '15 at 04:29

There are many possible techniques. Everything depends on your task. For instance, you can remember maximal number of initialized element max of your array. I.e. if your algorithm can garantee, that all elements from 0 to max ara initialized, you can use simple check if (0 <= i && i <= max) or something like this.

But if your algorithms need to initialize arbitrary elements (i.e. random access), you need general solution. For instance, more effective data structure (not simple array, but sparse array or something like this).

So, add more details about your task. I expect we'll find the best solution for it.

Trick to avoid needing to initialize an array

3 Answers3