1

I have an integer array with size 5. The problem is that the array accepts 6 elements as opposed to 5. But when I print the array out I get the desired output. ie; with 5 elements. Why does this happen?

These were some of the questions which showed some similarity to my question... but the fact is that these are somewhat advanced for me with classes, structures, etc... And also most of these questions talked about character array but my question is of integer array (PS: I don't know if that really matters)

Array with only 1 element storing more than it should

Why allocate an array of size 1 more than the requested size?

Array contains more elements than declared size. Why does it do that?

C program displaying more characters than array size

I tried further searching and the closest answer I got was : "Array may not be properly null terminated." but i don't know anything about integer delimiter except for string delimiter which is "\0" i believe;

// my code
#include <stdio.h> 

int main(void) {
    int ar[5];
    printf("Enter the array values: ");
    for(int i = 0; i < 5; i++) {
        scanf("%d ", &ar[i]);
    }
    printf("Array elemnts are: ");
    for(int j = 0; j < 5; j++) {
        printf("\n%d", ar[j]);
    }

    return 0;
}

output

shaderone
  • 408
  • 9
  • 21
  • 3
    You are writing into memory that is not yours. Anything can happen! – Paul Ogilvie May 11 '21 at 07:11
  • 1
    But your code does not show using 6 elements. `0..4` is 5 elements – Paul Ogilvie May 11 '21 at 07:13
  • 1
    C does not care how long your array is and will happily allow you to write past the end of it, even if that memory is not free – mousetail May 11 '21 at 07:13
  • Where do you think is your problem with the above code? – Devolus May 11 '21 at 07:17
  • 1
    What mousetail said. C does not have runtime checking for going out of bounds in an array as other languages do. It's pretty much hardwired to memory addresses, and if you touch memory you don't own or properly allocate, you're in "undefined behavior" territory. It might crash correctly. Or it might seem to work. Or it might cause other weird behavior later. – selbie May 11 '21 at 07:17
  • No, wrong. See the answer below. As said, _you write into memory that is not yours_. – Paul Ogilvie May 11 '21 at 07:28
  • Now that i have found the solution to my quesion, how can i close this ? – shaderone May 11 '21 at 12:46
  • should i mark my own answer as accepted one? As i said it's actually my very first question so how should i proceed now? – shaderone May 11 '21 at 12:49
  • 1
    I would say yes, mark your own answer since you found the solution for this specific case. The rest of us just answered your question without solving your issue. – Emoun May 12 '21 at 11:58

3 Answers3

3

In C, array bounds are not checked in any way. Therefore, you can freely try to access an element outside the size of the array. However, doing so is undefined behavior and may cause your computer to shoot demons out of your nose.

So, while working on arrays, you must yourself ensure that you only access elements in the array. You will get no error from the compiler if you do this wrong.

Emoun
  • 2,297
  • 1
  • 13
  • 20
3

As funny as it sounds, This was caused because of the space after %d in the scanf statement. When that space was removed I was able to input and output exactly 5 elements.

[Before removing the space:] https://i.stack.imgur.com/Gv13S.png

[After removing the space:] https://i.stack.imgur.com/TvdvG.png

shaderone
  • 408
  • 9
  • 21
  • 1
    Good on catching this. It's not that you were entering extra values into the array, it was that the `scanf` format expected some non-whitespace characters before it would return. – dbush May 11 '21 at 12:49
  • 1
    That's embarrassing... I totally overlooked that. Good job on solving it. – Marcus Harrison May 11 '21 at 22:08
  • @dbush Thanks for the clarification. Also could you please tell me what happens to that unwanted element? Like, it simply dissappears? I am just curious to know. – shaderone May 12 '21 at 07:43
  • 1
    @PixieDust It's sitting in the input buffer. If you were to call `scanf` once more with a `"%d"` format string it would read that value without any additional prompting. – dbush May 12 '21 at 11:47
1

C does not feature boundary-checked types, since it would introduce a lot more instructions at run-time and - in some cases - limit some of the use-cases of C.

In many cases, arrays are transparently converted into pointers. The square-brackets index operator is largely a wrapper around pointer arithmetic, so understanding pointers is an important aspect of understanding C and its behaviours, so I'll do my best to explain them.

When you declare a variable of a type in one of your functions (or globally), your program reserves a certain amount of memory for that type. When you declare an array of a variable type, the program reserves enough memory for that many elements.

The C standard only defines a fixed size for the char type - 1 byte. other types, such as int, can be defined as different sizes depending on the compiler or the processor your compiler is building for - typically based on the number of bytes the processor can efficiently operate with.

For example, a 32-bit processor can efficiently add, multiply, integer divide and deduct 32-bit numbers, but it might take multiple operations for it to do similar computations with 64-bit numbers - for example, by adding the first 32-bit parts, storing any carry-over values, adding the second two 32-bit parts and the carry.

For a 32-bit processor, then, an int is usually defined as 4 bytes, or (32 bits)/(8 bits per byte). Similarly, 64-bit processors may use a 64-bit int, etc.

You can find out the sizes of types for your compiler/processor combination with the sizeof operator:

#include <stdio.h>

int main()
{
    printf("%ld\n", sizeof(int));
}

On a 64-bit x86 processor - processors running some version of Intel's x86 assembly, including desktop/laptop AMD processors - the above code will print 4, even for 64-bit processors. This is for technical reasons I won't go into, but hopefully you get the idea.

So when you declare a variable as an array of 4 chars, your function reserves 4 * 1 = 4 bytes for it, starting at the beginning of the array, with each element right next to each other. Similarly, when you declare a variable of 4 ints, it reserves 4 * 4 = 16 bytes, one next to the other.

Below is a visualization of what happens in memory when you declare arrays of different sizes/types:

+---------------+-----+-----+-----+-----+-----+-----+-----+-----+
|char foo[4]    | 1st | 2nd | 3rd | 4th | --- | --- | --- | --- |
+---------------+-----+-----+-----+-----+-----+-----+-----+-----+
|int foo[2]     | 1st | 1st | 1st | 1st | 2nd | 2nd | 2nd | 2nd |
+---------------+-----+-----+-----+-----+-----+-----+-----+-----+

Now lets take a look at how C pointers help you move around memory.

A pointer is a variable which stores two bits of information:

  • The location in memory we're interested in;
  • What to treat that variable like.

It can be confusing to reason about, because it's a variable which stores two bits of data, so lets start with a pointer which only stores a memory address - void *.

We can get the address of a variable with the & operator, and print if with printf("%p"):

#include <stdio.h>

int main()
{
    int a = 0;
    int b = 42;
    
    void *a_pointer = &a;
    void *b_pointer = &b;
    
    printf("%p: %d\n", a_pointer, a);
    printf("%p: %d\n", b_pointer, b);
}

On my system, the above program prints something like this:

~/playground$ ./pointers 
0x7ffd54bda370: 0
0x7ffd54bda374: 42
~/playground$

As you can see, the program has put a and b right next to each other - b's memory address ends with ...374, 4 bytes more than a's address ...370.

The compiler and operating system make decisions about how to (re-)arrange variables and align them to make them quick to access - in this case, it's decided that placing a and b right next to each other results in the fastest access and lowest memory consumption.

Lets take a look at a similar program with an array of integers instead:

#include <stdio.h>

int main()
{
    int a[2] = { 0, 42 };

    void *a_first = &a[0];
    void *a_second = &a[1];

    printf("%p: %d\n", a_first, a[0]);
    printf("%p: %d\n", a_second, a[1]);
}

And lets see what it prints:

~/playground$ ./pointers 
0x7ffe51de1fb0: 0
0x7ffe51de1fb4: 42
~/playground$

If you look at the memory addresses, you can see the same thing: the first is at address ...fb0, while the second is at address ...fb4, 4 bytes after.

There are important differences between these two cases:

  • In the first program, we didn't make any decisions about how the variables should be arranged - in functions with more variables, and variables of different sizes, the compiler is allowed to re-arrange them in memory arbitrarily, if the overall effect is the same and it allows the program to run faster. That means we can't rely on the arrangement of those variables in memory.
  • In the second program, we explicitely told the compiler to put the two integers next to each other. No matter what else is in the function, we have a guarantee that a[1] will always be placed 1 int after a[0].

We've looked at the addresses of these raw pointers - and that's about as much as we can do with them, since C doesn't know how to operate on them or their values. To tell C what is possible with these pointers, and what we can do with the pointed-to values, we need to tell it the type pointed to.

Now we can do things like get the value from the pointer directly - we couldn't do that before, because C has no idea how much memory to access from void *.

#include <stdio.h>

int main()
{
    int a[2] = { 0, 42 };
    
    int *a_pointer = &a[0];
    
    printf("%p: %d\n", a_pointer, *a_pointer);
}

With the information that a_pointer points to an int, C now knows to only read 4 bytes when we ask for the value. You can see in the printf statement that we can ask for the value with the star operator *. We could also write to the value, by doing *a_pointer = 84; for example.

That's not all we can do with this pointer - we can "seek through" memory by increasing or decreasing this pointer.

#include <stdio.h>

int main()
{
    int a[2] = { 0, 42 };
    
    int *a_pointer = &a[0];
    
    printf("%p: %d\n", a_pointer, *a_pointer);
    
    // re-assign the pointer to the next `int`
    a_pointer++;
    printf("%p: %d\n", a_pointer, *a_pointer);
    
    // a_pointer-- works too
    a_pointer--;
    // You can also do standard arithmetic on it
    printf("%p: %d\n", a_pointer + 1, *(a_pointer + 1));
}
~/playground$ ./pointers 
0x7fff23cc1b10: 0
0x7fff23cc1b14: 42
0x7fff23cc1b14: 42
~/playground$

Take a look at the last thing we do in the last prinft statement - *(a_pointer + 1). This is actually exactly the same thing C does when you use square-brackets to seek through an array - a[1] == *(a_pointer + 1).

In fact, I've been doing int *a_pointer = &a[0]; for each of these programs - getting the memory address of the first member of the a array - when you can actually just assign the pointer's value directly to the array: int *a_pointer = a;. You can even use the square-brackets access on the pointer itself to do exactly the same thing:

#include <stdio.h>
  
int main()
{
    int a[2] = { 0, 42 };

    int *a_pointer = a;

    printf("%d\n", a[0]);
    printf("%d\n", a_pointer[0]);

    printf("%d\n", a[1]);
    printf("%d\n", a_pointer[1]);
}
~/playground$ ./pointers 
0
0
42
42
~/playground$

Now, finally, we can start to answer your question.

Accessing values in an array in C happens as operations on pointers-to-memory. The fundamental problem is, we want the program to run as quickly as possible and with as little resource consumption as possible - but that results in situations like the first program, compared to the second: even though we wrote different code, it resulted in a very similar layout in memory.

If you just looked at the memory addresses those programs printed to console, you would not be able to tell which came from an array of integers and which came from two separate integer variables. Introducing logic in the program to check if accesses like that have "overstepped their bounds" - or accessed a different variable, vs. the next element of the same variable - would introduce a lot more run-time logic and reduce the performance of the program overall.

In these simple cases, a human can easily look at the code and say what valid values for the pointers should be - but it gets more complicated in programs (or libraries) which need to operate with different amounts of data. It may even be that they don't know how much memory they can scan until the user runs the program - a programmer can't know, for example, how many E-mails a user has, or how many friends are in their friends list.

Marcus Harrison
  • 819
  • 6
  • 19