10

I always thought that argc was required to mark the end of argv but I just learned that argv[argc] == NULL by definition. Am I right in thinking that argc is totally redundant? If so, I always thought C made away with redundancy in the name of efficiency. Is my assumption wrong or there's a historic reason behind this? If the reason is historic, can you elaborate?

Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
elik
  • 180
  • 1
  • 9
  • 1
    Couldn't `NULL` be an element of `argv`? That is, before the actual end of the array. – Andras Deak -- Слава Україні Aug 13 '15 at 23:07
  • 4
    @AndrasDeak, I don't think so. An element of `argv` could be an empty string, that is an array of just one element, a `0`-byte. – Jens Gustedt Aug 13 '15 at 23:12
  • 4
    Yes it is redundant. The reason is "historical reasons" – M.M Aug 13 '15 at 23:13
  • 1
    I guess you could argue it is a speed optimization to read `argc` instead of iterating over `argv`. – M.M Aug 13 '15 at 23:14
  • 2
    Plus the ability to say `if (argc < 3) { printf("error message"); return 1; }` without looping the `argv` list first. Not to mention various other choices that might be made based on the number of arguments (read files from command-line args vs. reading stdin, etc.) – Paul Roub Aug 13 '15 at 23:14
  • How would you get argv to have an empty string with non empty strings following it? – David Zech Aug 13 '15 at 23:16
  • 5
    @Nighthawk441 you could call it like `execname 'arg1' 'arg2' '' 'arg4'`, in which case `argv[3]` is an empty string. And, as @Jens said, that's not `NULL`. – Andras Deak -- Слава Україні Aug 13 '15 at 23:21
  • You could invoke a program via `execv()` and give it an array with null pointers in the middle of it. This would be a bad idea; the program's behavior would be undefined. The C standard specifically requires the pointers `argv[0]` through `argv[argc-1]` to be pointers to strings, which means they can't be null pointers. – Keith Thompson Aug 13 '15 at 23:42
  • See also [What should `main()` return in C and C++](http://stackoverflow.com/questions/204476/what-should-main-return-in-c-and-c/18721336#18721336), which quotes what the standard says. The reason for the redundancy is primarily historical (that's how it was done in C in the mid-70s, so that's how it has been done ever since). And now, of course, there's a quarter century of it being standardized behaviour, and changing it would break a lot of code. – Jonathan Leffler Aug 13 '15 at 23:44
  • 1
    @KeithThompson: `execve()` only knows the length of the argument list by coming across the first null pointer. The extras 'in the middle' simply don't count. It's a little more debatable what happens if the zeroth argument is a null pointer. The standard permits `argc == 0`, and still requires `argv[argc] == 0`. – Jonathan Leffler Aug 13 '15 at 23:47
  • @JonathanLeffler: You're right, and I was wrong. You could pass *invalid* argument pointers via any of the `exec*()` functions, but the end of the list is defined by a null pointer (either as an argument for the variadic functions `execl`, `execlp`, and `execle`, or as the last element of the array for `execv` and `execvp`). And the `argc` value passed to the invoked program is computed from that. (You could pass invalid pointers, which could make the invoked program unhappy, but that's a different thing.) – Keith Thompson Aug 14 '15 at 00:03

2 Answers2

6

History.

Harbison & Steel (5th Edition, 9.9 "The main program") says the following:

Standard C requires that argv[argc] be a null pointer, but it is not so in some older implementations.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • It would be helpful to have some indication of which 'older implementations' don't have `argv[argc]` as a null pointer — I suspect H&S don't provide that level of detail, though. They'd have to be pretty old these days. (I was never unlucky enough to come across one, but there are plenty of esoteric platforms that I've not programmed on.) – Jonathan Leffler Aug 13 '15 at 23:49
  • FWIW I managed to find a scanned PDF of K&R 1st ed., and as far as I can tell they never mention a null sentinel at `argv[argc]` and all examples use `argc` to determine he end of the `argv[]` array. The 2nd Edition points out the null sentinel, but doesn't use it in any examples. – Michael Burr Aug 14 '15 at 00:28
3

Here's the history.

In first edition UNIX, which predates C, exec took as arguments a filename and the address of a list of pointers to NUL-terminated argument strings terminated by a NULL pointer. From the man page:

sys exec; name; args      / exec = 11.
name: <...\0>
...
args: arg1; arg2; ...; 0
arg1: <...\0>
...

The kernel counted up the arguments and provided the new image with the arg count followed by a list of pointers to copies of the argument strings, at the top of the stack. From the man page:

sp--> nargs
      arg1
      ...
      argn

arg1: <arg1\0>
...
argn: <argn\0>

(The kernel source is here; I haven't looked to see if the kernel actually wrote something after the pointer to the last argument.)

At some point, up through the 6th edition, the documentation for exec, execl, and execv began to note that the kernel placed a -1 after the arg pointers. The man page says:

Argv is not directly usable in another execv, since argv[argc] is -1 and not 0.

At this point, you could argue that argc was redundant, but programs had, for some time, been using it rather than looking through the argument list for -1. For example, here's the beginning of cal.c:

main(argc, argv)
char *argv[];
{
    if(argc < 2) {
        printf("usage: cal [month] year\n");
        exit();
    }

In 7th edition, exec was changed to add a NULL pointer after the argument strings, and this was followed by a list of pointers to the environment strings, and another NULL. The man page says:

Argv is directly usable in another execv because argv[argc] is 0.

Mark Plotnick
  • 9,598
  • 1
  • 24
  • 40