0

Is it possible to create arrays based of their index as in

int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y] = someNr;

dynamically/on the run, without creating foo[0...3][0...4]?

If not, is there a data structure that allow me to do something similar to this in C?

Tomas Berger
  • 173
  • 1
  • 3
  • 15
  • Sounds like you might want a Map https://stackoverflow.com/questions/21958247/map-like-structure-in-c-use-int-and-struct-to-determine-a-value – bhspencer Apr 30 '19 at 00:32
  • 1
    You need to declare the array first *then* assign second unless you're prepared to [assign the values properly](https://stackoverflow.com/questions/3535410/assign-multiple-values-to-array-in-c). Any values you don't initialize are, by definition, uninitialized and contain junk data so you probably want to initialize them all, not selectively. – tadman Apr 30 '19 at 00:34
  • @tadman If I declare it first, then it won't be dynamically – Tomas Berger Apr 30 '19 at 00:36
  • If you want it fully dynamic then you use `malloc`. You have to know the maximum size beforehand or this is not possible. As bhspencer says, a map may be what you want if you have no idea where these values of `x` and `y` are going to land. – tadman Apr 30 '19 at 00:37
  • @bhspencer it is very similar to what I am looking for, but do you have to hash to get the value? I really want to do it in O(1) time and with hashing I risk getting the same index having to link them together and do it in 2 operations – Tomas Berger Apr 30 '19 at 00:42
  • @tadman I know they are going to land in between 0 - 1023, but I also now that not even 1/4 is regularly going to be occupied, so it seems like a waste of space to declare memory that won't be used. I did take a quick look at mapping, and it seems to be in the ballpark, but have an issue with it (See previous comment) – Tomas Berger Apr 30 '19 at 00:45
  • 2
    If only 25% are going to be occupied you have to focus on what the actual cost is here. Would 1024 * 1024 * 4 really break the bank? That's only 4MB of memory. Unless you have thousands of these things or you're really strapped for memory then that's not going to be a big deal. You could save some memory by using `short int` or `char` instead if that was critical. Think about constraints both in terms of the values you must capture and the memory conditions you must operate within. – tadman Apr 30 '19 at 00:48
  • What I am trying to implement is a multi-level page table. And if it takes up 4MB of memory it defeats the point of a multi-level page table. They are supposed to be allocated as needed. But thanks a lot for the suggestions – Tomas Berger Apr 30 '19 at 00:53
  • @TomasBerger if you are using ints as the keys and you know there are going to be no collisions then no you don't need to hash your keys just use the ints into your map directly. – bhspencer Apr 30 '19 at 14:33

3 Answers3

2

No.

As written your code make no sense at all. You need foo to be declared somewhere and then you can index into it with foo[x][y] = someNr;. But you cant just make foo spring into existence which is what it looks like you are trying to do.

Either create foo with correct sizes (only you can say what they are) int foo[16][16]; for example or use a different data structure. In C++ you could do a map<pair<int, int>, int>

John3136
  • 28,809
  • 4
  • 51
  • 69
  • I know it makes no sense, the point was to illustrate what I wanted to do. I did take a look at mapping for C, and it seems that I have to hash to find the correct map in an array of mapping, yes? – Tomas Berger Apr 30 '19 at 00:46
  • There are different approaches. Hashing is probably the most common. "Sparse array" is a good search term for more research. As @tadman said in comments: 1024x1024 is not that expensive (perhaps don't put it on the stack) so whether you need to do some other data type or just allocate a big array is really your call. – John3136 Apr 30 '19 at 00:53
  • I know it's normally no big deal to take up 4MB of memory, but I am working on OS, where on a 32-bit It only takes up 4mb per program running. But on a 64 bit I would need something like 2*2^25 int array – Tomas Berger Apr 30 '19 at 01:04
  • Well then looks like something like a sparse array is the way to go. – John3136 Apr 30 '19 at 01:31
1

Variable Length Arrays

Even if x and y were replaced by constants, you could not initialize the array using the notation shown. You'd need to use:

int fixed[3][4] = { someNr };

or similar (extra braces, perhaps; more values perhaps). You can, however, declare/define variable length arrays (VLA), but you cannot initialize them at all. So, you could write:

int x = 4;
int y = 5;
int someNr = 123;
int foo[x][y];

for (int i = 0; i < x; i++)
{
    for (int j = 0; j < y; j++)
        foo[i][j] = someNr + i * (x + 1) + j;
}

Obviously, you can't use x and y as indexes without writing (or reading) outside the bounds of the array. The onus is on you to ensure that there is enough space on the stack for the values chosen as the limits on the arrays (it won't be a problem at 3x4; it might be at 300x400 though, and will be at 3000x4000). You can also use dynamic allocation of VLAs to handle bigger matrices.

VLA support is mandatory in C99, optional in C11 and C18, and non-existent in strict C90.

Sparse arrays

If what you want is 'sparse array support', there is no built-in facility in C that will assist you. You have to devise (or find) code that will handle that for you. It can certainly be done; Fortran programmers used to have to do it quite often in the bad old days when megabytes of memory were a luxury and MIPS meant millions of instruction per second and people were happy when their computer could do double-digit MIPS (and the Fortran 90 standard was still years in the future).

You'll need to devise a structure and a set of functions to handle the sparse array. You will probably need to decide whether you have values in every row, or whether you only record the data in some rows. You'll need a function to assign a value to a cell, and another to retrieve the value from a cell. You'll need to think what the value is when there is no explicit entry. (The thinking probably isn't hard. The default value is usually zero, but an infinity or a NaN (not a number) might be appropriate, depending on context.) You'd also need a function to allocate the base structure (would you specify the maximum sizes?) and another to release it.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
1

Most efficient way to create a dynamic index of an array is to create an empty array of the same data type that the array to index is holding.

Let's imagine we are using integers in sake of simplicity. You can then stretch the concept to any other data type.

The ideal index depth will depend on the length of the data to index and will be somewhere close to the length of the data.

Let's say you have 1 million 64 bit integers in the array to index.

First of all you should order the data and eliminate duplicates. That's something easy to achieve by using qsort() (the quick sort C built in function) and some remove duplicate function such as

uint64_t remove_dupes(char *unord_arr, char *ord_arr, uint64_t arr_size)
{   
    uint64_t i, j=0;
    for (i=1;i<arr_size;i++)
    {
        if ( strcmp(unord_arr[i], unord_arr[i-1]) != 0 ){
            strcpy(ord_arr[j],unord_arr[i-1]);
            j++;
        }
        if ( i == arr_size-1 ){
          strcpy(ord_arr[j],unord_arr[i]);
          j++;  
        }   
    }
    return j;
}

Adapt the code above to your needs, you should free() the unordered array when the function finishes ordering it to the ordered array. The function above is very fast, it will return zero entries when the array to order contains one element, but that's probably something you can live with.

Once the data is ordered and unique, create an index with a length close to that of the data. It does not need to be of an exact length, although pledging to powers of 10 will make everything easier, in case of integers.

uint64_t* idx = calloc(pow(10, indexdepth), sizeof(uint64_t));

This will create an empty index array. Then populate the index. Traverse your array to index just once and every time you detect a change in the number of significant figures (same as index depth) to the left add the position where that new number was detected.

If you choose an indexdepth of 2 you will have 10² = 100 possible values in your index, typically going from 0 to 99.

When you detect that some number starts by 10 (103456), you add an entry to the index, let's say that 103456 was detected at position 733, your index entry would be:

index[10] = 733;

Next entry begining by 11 should be added in the next index slot, let's say that first number beginning by 11 is found at position 2023

index[11] = 2023;

And so on.

When you later need to find some number in your original array storing 1 million entries, you don't have to iterate the whole array, you just need to check where in your index the first number starting by the first two significant digits is stored. Entry index[10] tells you where the first number starting by 10 is stored. You can then iterate forward until you find your match.

In my example I employed a small index, thus the average number of iterations that you will need to perform will be 1000000/100 = 10000

If you enlarge your index to somewhere close the length of the data the number of iterations will tend to 1, making any search blazing fast.

What I like to do is to create some simple algorithm that tells me what's the ideal depth of the index after knowing the type and length of the data to index.

Please, note that in the example that I have posed, 64 bit numbers are indexed by their first index depth significant figures, thus 10 and 100001 will be stored in the same index segment. That's not a problem on its own, nonetheless each master has his small book of secrets. Treating numbers as a fixed length hexadecimal string can help keeping a strict numerical order.

You don't have to change the base though, you could consider 10 to be 0000010 to keep it in the 00 index segment and keep base 10 numbers ordered, using different numerical bases is nonetheless trivial in C, which is of great help for this task.

As you make your index depth become larger, the amount of entries per index segment will be reduced

Please, do note that programming, especially lower level like C consists in comprehending the tradeof between CPU cycles and memory use in great part.

Creating the proposed index is a way to reduce the number of CPU cycles required to locate a value at the cost of using more memory as the index becomes larger. This is nonetheless the way to go nowadays, as masive amounts of memory are cheap.

As SSDs' speed become closer to that of RAM, using files to store indexes is to be taken on account. Nevertheless modern OSs tend to load in RAM as much as they can, thus using files would end up in something similar from a performance point of view.

Daniel J.
  • 308
  • 2
  • 12