2

Here is my minimal reproducible example:

#include <stdio.h>

int main( int argc, char* argv[])
{
    printf (" this is the contents of argc:%d\n",argc);
            
    int i;

    for (i = 0; i < argc ; i++){
       printf(" argv = %d = %s\n",i,argv[i]);
    }
      
    return 0;
}

When I change argc in the for loop into a number, lets say 10, the code crashes before it reaches 10:

$ ./argc one two three
 this is the contents of argc:4
 argv = 0 = ./argc
 argv = 1 = one
 argv = 2 = two
 argv = 3 = three
 argv = 4 = (null)
 argv = 5 = SHELL=/bin/bash
 argv = 6 = SESSION_MANAGER=local/wajih:@/tmp/.ICE-unix/1230,unix/wajih:/tmp/.ICE-unix/1230
 argv = 7 = QT_ACCESSIBILITY=1
 argv = 8 = COLORTERM=truecolor
 argv = 9 = XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg

If I for example, change argc in the for loop to a 100; I get a very long error message, which ends with this:

 argv = 54 = GDMSESSION=ubuntu
 argv = 55 = DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
 argv = 56 = LC_NUMERIC=ar_AE.UTF-8
 argv = 57 = _=./argc
 argv = 58 = OLDPWD=/home/wajih
 argv = 59 = (null)
Segmentation fault (core dumped).

I want to understand the reason this happens.

blackbrandt
  • 2,010
  • 1
  • 15
  • 32
  • 4
    Welcome to SO. If you already know that you have `argc` elements in the array, what do you expect to see if you look into more than you were provided? – Gerhardh May 17 '23 at 07:16
  • 4
    You are simply accessing the array passed to your function beyond its end which is invoking undefined behaviour. Anything can happen. It can crash immediately at the next entry or seem to work. Note: The terminating `NULL` entry in `argv[argc]` is safe to read. Anything beyond is UB. – Gerhardh May 17 '23 at 07:17
  • 2
    Going out of bounds of an array leads to *undefined behavior*. It's your responsibility as the programmer to make sure that your program doesn't do that. – Some programmer dude May 17 '23 at 07:20
  • 5
    More practically what happens, is that on some systems (like Linux) there's actually a *third* argument passed to the `main` function, which is an array of `char *` pointers (like `argv`) for the *environment variables*. It's used as `int main(int argc, char *argv[], char *envp[])`. Just like `argv` this `envp` array is null-pointer terminated, so when `envp[i]` is a null pointer then you should not continue. – Some programmer dude May 17 '23 at 07:22

5 Answers5

5

It might be easier to understand what's going on here with an analogy.

Suppose I live in a long, narrow house. The house is divided into 10 rooms, but they're all the same size and they're all arranged in a straight line.

Suppose I'm interested in robotics. Suppose I build a little robot to drive around inside my house, taking pictures of each room. Because my house's rooms are all laid out in a straight line, the robot's navigation task is pretty simple.

Once I've got the robot's software working perfectly, I ask the robot to make a complete photographic survey of all 20 rooms in my house. (Oops, I made a mistake, there.) And the robot starts driving along the main axis of the house taking pictures of each room in turn.

After it takes pictures of the first 10 rooms, there's a crashing sound as the robot drives through the end wall of the house. Its pictures of the "11th room" are of splintered wood and plaster. Its pictures of the "12th room" are of the garden outside the end of my house. But then there's another crashing sound, and the robot keeps taking pictures, and somehow, remarkably, they look like the insides of a house again!

It turns out that's because the robot has driven into my neighbor's house and is now taking pictures there.

From this silly little story we can learn two things:

  1. If there are 10 rooms in my house, and I ask my simpleminded robot to take pictures of 20 rooms, something strange, unpredictable, and wrong is probably going to happen.
  2. Even though what happens is going to be strange, unpredictable, and wrong, little bits of it can seem to make some kind of sense, depending on circumstances. In this case, my robot's picture of the "15th room" of my house looked just like a bedroom, although it didn't look like any bedroom in my house, and what the two people were doing in bed there didn't look like anything that happens in my house, either...

But the other important aspect of the analogy is that you obviously can't depend on any of it, because too many of the circumstances are outside of your control. The robot might have damaged itself so badly driving through walls that it couldn't continue taking pictures. If there happened to be a street just past the garden at the end of my house, the robot might have gotten run over by a truck. If there happened to be a cliff just past the garden at the end of my house, the robot might have fallen into the ocean. Etc.

C, like the simpleminded robot in my story, does not have any built-in protections against running off the end of arrays. If you try to access the 15th element of a 10-element array, what you don't typically get is an error message saying "Array bounds exceeded." What you get instead is something strange, unpredictable, and wrong — except that, depending on circumstances, there might seem to be some kind of hidden meaning, which might lead you to waste time trying to figure it out, or asking about it on Stack Overflow. But rather than doing that, you might want to spend your time working on a better obstacle detection or collision avoidance algorithm for the robot, instead. :-)

See also these previous SO questions on the topic of exceeding the bounds of arrays: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, and 14.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
3

The argv pointer has a very specific location in the program's memory.

When you run a binary, there is always some entry point. In C, that is in the main() function. But, in order to prepare the environment for the binary to start at that location, the OS has to do some things first.

It has to copy over environment variables, request and offset memory from the OS, etc. Because this process is completely deterministic (per OS), you can actually expect to read the environment variables just after these arguments.

Example memory layout on Linux

This principle is fundamental to computer security. If an attacker manages to leak a pointer in this segment of memory, they can overwrite some environment variable (i.e. PATH), to point to their own binary first. hackmd has a really nice example of this: HackMD: Environment variables attack.


Image source: COMPILER, ASSEMBLER, LINKER AND LOADER: A BRIEF STORY

2

You are invoking undefined behaviour. The C Standard says that argv[argc] will be a null pointer, and that trying to access argv[i] for i < 0 or i > argc is undefined behaviour.

"Undefined behaviour" means anything can happen. If you ask for an explanation, there is none other than "it is undefined behaviour". It is legal for the compiler to produce code that completely erases your hard drive after sending all your money to my bank account. Don't do it. You are doing things that you are not allowed to do, and that's the complete answer.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
1

Going past the end of an array will give you undefined behavior in C. The results you would get would vary depending on the compiler, the operating system, the shell you use, and a lot of other factors.

In this specific case, you are listing environment variables, because your main function is passed not just the arguments in argv but also a list of environment variables in envp, and just out of coincidence, those values are placed right after the argv array. Just remember that you can never trust that to be true.

main(int argc, char *argv[], char *envp[]);

In summary, don't go past the end of the array. It will lead to Bad Things™.

If your program needs to use the values of environment variables, you must to so through the envp array, and not abuse undefined behavior through the argv array.

Anders Marzi Tornblad
  • 18,896
  • 9
  • 51
  • 66
0

Most Unix System provides a 3rd argument to main function.

int main( int argc, char *argv[], char *envp[]);

It is called environment variables. In the above case it prints the contents of the 3rd argument - envp. But it will not show the same behavior always. Printing data from argv after argc count has undefined behavior

  • 2
    That is only true if the 2 arrays holding the pointers are also adjacent in memory. Passing their addresses in 2 consecutive parameters does not really ensure that. One should never ever rely on that. Also this still involves undefined behaviour as accessing an array beyond its limit is not allowed. – Gerhardh May 17 '23 at 07:47