Is it ok to create a large array in the heap when you aren't necessarily using all of it?

Question

So I'm looking at a solution to some coding interview type questions, and there's an array inside a struct

#define MAX_SIZE 1000000

typedef struct _heap {
    int data[MAX_SIZE];
    int heap_size;
}heap;

heap* init(heap* h) {
    h = (heap*)malloc(sizeof(heap));
    h->heap_size = 0;
    return h;
}

This heap struct is later created like so

heap* max_heap = NULL;
max_heap = init(max_heap);

First of all, I'd wish this was written in C++ style than C, but secondly if I'm just conscerned about the array, I'm assuming it is equivalent to solely analyze the array portion by changing the code like this

int* data = NULL;
data = (int*)malloc(1000000 * sizeof(int));

Now in that case, is there any problems with declaring the array with the max size if you are probably just using a little bit of it?

I guess this boils down to the question of when an array is created in the heap, how does the system block out that portion of the memory? In which case does the system prevent you from accessing memory that is part of the array? I wouldn't want a giant array holding up space if I'm not using much of it.

your question seems a bit confused. i'm a bit blind guessing there. my blind guess is that you were required to make a pseudo heap allocator, by reserving memory and then coding your own malloc to reserve data in this space. To answer one of your questions, when you allocate space on the heap, it will be reserved by the program, but if you are running on linux, it uses lazy allocation (it maps a memory page (4kiB) only when you try accessing data within it, not when reserving it), so no matter how much data you reserve you will only get it if you use it — laenNoCode, Dec 12 '22 at 08:35
If your system have the space, and you will actually need and use all that space during the life-time of your program, then I'd say it might be okay. Other than that, for C++ use `std::vector` instead, or for C use `malloc` and `realloc` as needed. — Some programmer dude, Dec 12 '22 at 08:35
Depending on implementation it is possible to reserve a large `address space` for the array yet map very little `physical memory` to that space. Then, on any access (read or write) to the addresses which are not mapped to any memory, it is possible to map the `physical memory` into that `address space`, on demand. With this technique (lazy allocation), the allocation of a large array that is only partially used will waste only the memory `address space` - not the actual `physical memory`. To answer this question in more detail you need to specify`what CPU and operating system you are using — Pavel Stepanek, Dec 12 '22 at 08:41
@IaenNoCode No, the actual question had nothing to do with heap. (I can't share the question itself cuz it's a korean website and in korean rip) It's just that the solution this person posted used that kind of data structure. I had written up a (inefficient) solution in python that just used an array to keep some data. Started with an empty array and just used append, insert, and pop to dynamically change the array size as more data was needed or not needed. — itsmarziparzi, Dec 12 '22 at 08:41
the last snippets seems to make no sense. Whats the relation of `data` and `n` ? And please decide for one language. The answer is C and C++ (which are two different languages) is probably drastically different — 463035818_is_not_an_ai, Dec 12 '22 at 08:43
@463035818_is_not_a_number oh wait. you are right. I have no idea why i had data and n separate. Mmmm. I was confused because I was trying to analyze just the array and so was trying to mimic how the "heap" struct was allocated in the heap but instead focus on just the array and not the "heap_size" — itsmarziparzi, Dec 12 '22 at 08:50
@itsmarziparzi - *"the solution this person posted"* We know that there are no qualifications needed to post things on the internet. I bet that people posting "interview solutions" are not super experts, but newbies knowing less C++ than you do. So, if it looks odd, it likely just is. — BoP, Dec 12 '22 at 08:50
@BoP right that's why i wanted to ask so I could figure out if it's good code or not — itsmarziparzi, Dec 12 '22 at 08:51
is it c or c++ code? "good" c++ code is not "good" c code and vice versa — 463035818_is_not_an_ai, Dec 12 '22 at 08:53
@463035818_is_not_a_number I guess good C code since this is written in C. But I'm willing to learn about how to write it in good C++ code as well. And also for the data/n issue would it be right to say that how the array is declared indirectly via being inside the struct is the same as this then? int* data=(int*)malloc(1000000* sizeof(int)); — itsmarziparzi, Dec 12 '22 at 08:58
you should focus on one langauge. If you want to ask about both you can open another quesiton. — 463035818_is_not_an_ai, Dec 12 '22 at 09:00
@laenNoCode thanks for your comment. Can you clarify "accessing data"? What happens when you allocate data[1000] and then you later run data[4]=2? What does it do with all the location that is not index 4? — itsmarziparzi, Dec 12 '22 at 09:08
"Accessing data" means reading or writing to a particular address. Sometimes it can also mean executing instructions at that particular address. — Pavel Stepanek, Dec 12 '22 at 09:22
Normal use is to allocate a "large enough" chunk to cover expected use and then `realloc` from there on if you need more memory still in some special use-case. — Lundin, Dec 12 '22 at 09:36
@itsmarziparzi On the linux kernel, when you first try to read of write to data[4], and it just have been malloc'ed but not accessed, the virtual memory data[4] doesn't point to physical memory (ie real RAM). So the access generate a page fault (close to a segfault) that the os parses and see it's reserved memory so it will map it to physical memory. The mapping to physical memory is done through a mecanism called paging that allocates 4kiB at a time, so if said you accessed to data[1], it's likely data[2] will be mapped at the same time. You can look up demand paging to get more info — laenNoCode, Dec 15 '22 at 12:49

chux - Reinstate Monica · Accepted Answer · 2022-12-12T09:26:41.977

is there any problems with declaring the array with the max size if you are probably just using a little bit of it?

Yes. The larger the allocation size the greater the risk of an out-of-memory error. If not here, elsewhere in code.

Yet some memory allocation systems handle this well as real memory allocations do not immediately occur, but later when needed.

I guess this boils down to the question of when an array is created in the heap, how does the system block out that portion of the memory?

That is an implementation defined issue not defined by C. It might happen immediately or deferred.

For maximum portability, code would take a more conservative approach and allocate large memory chunks only as needed, rather than rely on physical allocation occurring in a delayed fashion.

Alternative

In C, consider a struct with a flexible member array.

typedef struct _heap {
  size_t heap_size;
  int data[];
} heap;

Is it ok to create a large array in the heap when you aren't necessarily using all of it?

1 Answers1