18

I'm reading a section from C Primer Plus about command-line argument argv and I'm having difficulty understanding this sentence.

It says that,

The program stores the command line strings in memory and stores the address of each string in an array of pointers. The address of this array is stored in the second argument. By convention, this pointer to pointers is called argv, for argument values .

Does this mean that the command line strings are stored in memory as an array of pointers to array of char?

haccks
  • 104,019
  • 25
  • 176
  • 264
Jin
  • 1,902
  • 3
  • 15
  • 26
  • 1
    `Does this mean that the command line strings are stored in memory as an array of pointers to array of char? ` Yes. IMHO the whole confusion is caused by `The program stores the command line strings in memory ...` ; the point is that all this happens **before main() is called**. Main() is just a function, which is called with two arguments: an int and a pointer to an array of string pointers. – joop Aug 23 '16 at 09:39
  • 1
    @joop argv isn't "a pointer to an array of string pointers", if we're being pedantic. This whole question is about the difference between "pointer to an array" and "pointer to the first element of an array", really. – M.M Aug 23 '16 at 11:04
  • The OPs confusion is IMHO about the external side ("crt0"), which sets up the args, and the internal side (main()), which recieves it. That is also the cause of the difference between the (perceived: decayed) types. Really. – joop Aug 23 '16 at 11:13
  • @joop this is a "language lawyer" question which means it is about Standard C, in which there is no "crt0" and the setup of the arguments doesn't matter, so long as `argv` behaves as specified in the C Standard – M.M Aug 23 '16 at 11:25
  • The "language-lawyer" tag was added later (by someone who did not understand the *nature* of the question, IMHO) And I quoted "crt0" for a reason. Really. – joop Aug 23 '16 at 11:29
  • @joop: On Linux (and other OSes that use the SysV ABI), the `argv` array is in memory at process startup, in a format suitable for passing by reference to `main`. So the `crt0` libc startup code doesn't have to do anything with argv except pass a pointer to it to `main()`. In Linux, the kernel puts argv and the environment block at the top of the user-space stack. [The x86 flavours of the System V ABI are online here](https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI). – Peter Cordes Aug 23 '16 at 19:03
  • Possible duplicate of *[Command line arguments: argv](http://stackoverflow.com/questions/32898100/command-line-arguments-argv)* – Peter Mortensen Aug 23 '16 at 19:43
  • Another candidate is *[What does int argc, char *argv\[\] mean?](http://stackoverflow.com/questions/3024197)*. – Peter Mortensen Aug 23 '16 at 19:55

7 Answers7

27

argv is of type char **. It is not an array. It is a pointer to pointer to char. Command line arguments are stored in the memory and the address of each of the memory location is stored in an array. This array is an array of pointers to char. argv points to first element of this array.

                  Some
                  array

                 +-------+        +------+------+-------------+------+
argv ----------> |       |        |      |      |             |      |
                 | 0x100 +------> |      |      | . . . . . . |      |  Program Name1
         0x900   |       |        |      |      |             |      |
                 |       |        +------+------+-------------+------+
                 +-------+         0x100  0x101
                 |       |        +------+------+-------------+------+
                 | 0x205 |        |      |      |             |      |
         0x904   |       +------> |      |      | . . . . . . |      |  Arg1
                 |       |  .     |      |      |             |      |
                 +-------+        +------+------+-------------+------+
                 |  .    |  .      0x205  0x206
                 |  .    |
                 |  .    |  .
                 |  .    |
                 +-------+  .     +------+------+-------------+------+
                 |       |        |      |      |             |      |
                 | 0x501 +------> |      |      | . . . . . . |      |  Argargc-1
                 |       |        |      |      |             |      |
                 +-------+        +------+------+-------------+------+
                 |       |         0x501  0x502
                 | NULL  |
                 |       |
                 +-------+


0xXXX Represents memory address


1. In most of the cases argv[0] represents the program name but if program name is not available from the host environment then argv[0][0] represents null character.

haccks
  • 104,019
  • 25
  • 176
  • 264
  • Is the address of the whole string (&"string") stored in the array? – Jin Aug 23 '16 at 08:28
  • @Jin; No. It is the address of the first element of the string. – haccks Aug 23 '16 at 08:29
  • @haccks `argv is of type char **. It is not array.`.... I'm confused... Can you please refer my answer once? – Sourav Ghosh Aug 23 '16 at 08:30
  • @SouravGhosh; I am more confused than you. I don't know why standard says it *array*. Let me go through it in detail. – haccks Aug 23 '16 at 08:31
  • @haccks Then I guess my argument that "command line strings are stored in memory as an array of pointers to array of `char`? " is wrong! It should be corrected as "...as an array of pointers to `char`" – Jin Aug 23 '16 at 08:32
  • @Jin; Yes. It should be. – haccks Aug 23 '16 at 08:33
  • @Jin To be nitpicky, it should read `....pointers to null-terminated char array`...see the last line of my answer. :) – Sourav Ghosh Aug 23 '16 at 08:34
  • @SouravGhosh But as haccks says, shouldn't it be array of pointers to `char`, not array of pointers to `char` array? – Jin Aug 23 '16 at 08:36
  • 1
    @SouravGhosh; OK. Ultimately `argv` is pointing to the first element of an array of `char *` and that can be the reason standard referred it as array in this particular case. But the type of `argv` is `char **`. – haccks Aug 23 '16 at 08:37
  • 2
    Really helpful ASCII schema ! – Tim Aug 23 '16 at 09:41
  • 3
    As a nitpick, "some array" should have one more element that stores a null pointer. – jamesdlin Aug 23 '16 at 12:44
  • @jamesdlin; Good catch. – haccks Aug 23 '16 at 12:49
  • 3
    Also notable: argv[0] isn't guaranteed to hold the program name (but does, most of the time) but only something that *represents* the program name: http://stackoverflow.com/q/2050961/1116364 – Daniel Jour Aug 23 '16 at 13:39
  • @DanielJour: Isn't "represents" just there because filenames can be represented in other ways than as "null-terminated multibyte strings" (for example, NTFS uses a UTF-16 encoding), and they needed to specify *which* representation they're using here? There are bigger issues here than the word "*represents*", like the fact that "the name used to invoke the program" isn't very specific -- it needn't be a filename (e.g. Unix login shells), and even if it was, nobody said what directory or filename extension might have been used to resolve it. Or it could just be `""`. – SamB May 20 '17 at 22:11
17

Directly quoting from C11, chapter §5.1.2.2.1/p2, program startup, (emphasis mine)

int main(int argc, char *argv[]) { /* ... */ }

[...] If the value of argc is greater than zero, the array members argv[0] through argv[argc-1] inclusive shall contain pointers to strings, [...]

and

[...] and the strings pointed to by the argv array [...]

So, basically, argv is a pointer to the first element of an array of strings note. This can be made clearer from the alternative form,

int main(int argc, char **argv) { /* ... */ }

You can rephrase that as pointer to the first element of an array of pointers to the first element of null-terminated char arrays, but I'd prefer to stick to strings .


NOTE:

To clarify the usage of "pointer to the first element of an array" in above answer, following §6.3.2.1/p3

Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue. [...]

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • Then is the address of the whole string(&"string") stored in the array? Or is the address of the initial element of the string stored in the array? – Jin Aug 23 '16 at 08:49
  • @Jin Address of the initial element of the string. – Sourav Ghosh Aug 23 '16 at 08:50
  • 1
    ...and since the array is passed to a function (`main`) it collapses to a pointer. So `argv` is a pointer that points to an array. Try `sizeof argv` and `argv++` and you will see that `argv` is a pointer. – Klas Lindbäck Aug 23 '16 at 08:52
  • Then why do you say that "...as array of pointers to null-terminated `char` arrays", not "...as array of pointers to null-terminated `char`"? I think there is a difference between those two sentences.. or am I wrong? – Jin Aug 23 '16 at 08:52
  • @Sourac Ghosh Sorry my bad! I thought "pointer to a null-terminated `char` arrays" means pointer to the whole string(&"String") – Jin Aug 23 '16 at 08:57
  • @KlasLindbäck OK, appended the answer to clear out any confusions. :) – Sourav Ghosh Aug 23 '16 at 09:02
  • @KlasLindbäck; *So `argv` is a pointer that points to an array*: No. `argv` is a pointer that points to a pointer to a `char`. – haccks Aug 23 '16 at 09:10
  • @SouravGhosh a string is something that can be stored in an array. A string is not an array (or vice versa) – M.M Aug 23 '16 at 09:21
  • @haccks If an array is passed to `main` then `argv` points to an array. The type of `argv` is pointer to pointer to char, though. – Klas Lindbäck Aug 23 '16 at 09:48
  • @KlasLindbäck; In C you can't pass an array to a function but a pointer to the first element of the array. – haccks Aug 23 '16 at 10:02
  • @M.M Yes, in general you're very right, but isn't it like that a _string_ and a _null terminated char array_ are same things? §7.1.1/1 `A string is a contiguous sequence of characters terminated by and including the first null character.` – Sourav Ghosh Aug 23 '16 at 10:05
  • @haccks That's what I've been saying. – Klas Lindbäck Aug 23 '16 at 10:07
  • @SouravGhosh "array" is not mentioned in your quote . You wouldn't say that *10* was the same thing as *an int variable* would you? – M.M Aug 23 '16 at 10:20
  • @M.M Nopes, obviously not. :) Which statement in my answer you're referring to exactly, please? (or if you want, please feel free to edit it). – Sourav Ghosh Aug 23 '16 at 10:23
  • I'm referring to your comment "a null-terminated char array is a string". should be "a null-terminated char array contains a string" . "string" is to "array" as "10" is to "int variable" – M.M Aug 23 '16 at 10:26
  • @M.M Agreed. Poor choice of words from my side. Cannot edit that now, will delete add a new one. – Sourav Ghosh Aug 23 '16 at 10:30
  • @Jin For your [comment], the proper reply should be, there is no null-terminated `char`. A null-terminated `char` array contains (or, called as) a _string_. – Sourav Ghosh Aug 23 '16 at 10:32
11

This thread is such a train wreck. Here is the situation:

  • There is an array with argc+1 elements of type char *.
  • argv points to the first element of that array.
  • There are argc other arrays of type char and various lengths, containing null terminated strings representing the commandline arguments.
  • The elements of the array of pointers each point to the first character of one of the arrays of char; except for the last element of the array of pointers, which is a null pointer.

Sometimes people write "pointer to array of X" to mean "pointer to the first element of an array of X". You have to use the contexts and types to work out whether or not they actually did mean that.

M.M
  • 138,810
  • 21
  • 208
  • 365
1

Yes, exactly.

argv is a char** or char*[], or simply an array of char* pointers.

So argv[0] is a char* (a string) and argv[0][0] is a char.

blue112
  • 52,634
  • 3
  • 45
  • 54
  • If it were an array of zero-terminated strings (`char[]`, not `char*`), it'd just be `char[]`. –  Aug 23 '16 at 08:19
  • @Rhymoid Not zero terminated chars, zero terminated string**s** – blue112 Aug 23 '16 at 08:20
  • can you give an example of non-zero-terminated _string_s? – Sourav Ghosh Aug 23 '16 at 08:20
  • 1
    @blue112 It's confusing to describe it as such. There's also the practice of storing `"a list of\0strings like\0so\0"`, which is a (zero-terminated) array of zero-terminated strings. This is *not* how `argv` works, but it *is* used (e.g. in the `cmdline` procfile of the Linux kernel). `char*` is not a string. –  Aug 23 '16 at 08:21
  • @SouravGhosh `{_size:5, data: {'a','b','c','d','e'}}` Here's one. – blue112 Aug 23 '16 at 08:21
  • @blue112 ok, so `data` is a ___string___ ? – Sourav Ghosh Aug 23 '16 at 08:22
  • 1
    @haccks it points to the first element of an array of char pointers. – M.M Aug 23 '16 at 09:12
  • @M.M; That doesn't make `argv` an array of `char *`. – haccks Aug 23 '16 at 09:13
  • argv is not "simply an array of char* pointers", that's misleading, please consider removing that part of your answer. – einpoklum Aug 23 '16 at 16:08
  • @Rhymoid the standard allows the commandline arguments to be laid out in memory like that, with each pointer pointing to the start of the next string etc. – M.M Aug 23 '16 at 20:55
  • @M.M I'm sure it does, but it's not what `argv` itself is. –  Aug 23 '16 at 21:30
  • @Rhymoid in this context you'd say that argv works by pointing to the first of the pointers which point into that series of strings – M.M Aug 23 '16 at 22:09
0

Yes.

The type of argv is char**, i.e. a pointer to pointer to char. Basically, if you consider a char* to be a string, then argv is a pointer to an array of strings.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Peter K
  • 1,372
  • 8
  • 24
  • The strings don't have to be in an array, but the pointers to them are. Also, the end of the array of pointers is a null. – Marichyasana Aug 23 '16 at 08:21
  • *the type of `argv` is `char**`, i.e. a pointer to an array of pointers to arrays of char*: No. No way. – haccks Aug 23 '16 at 09:01
  • @haccks Care to elaborate? – Peter K Aug 23 '16 at 09:02
  • `char **` is read as *pointer to pointer to `char`* and not *pointer to an array of pointers to arrays of char*. This makes `argv` is of type `char *((*)[])[]`. – haccks Aug 23 '16 at 09:05
  • @haccks My bad. I was hoping to make it more intuitive, but it was just plain incorrect. – Peter K Aug 23 '16 at 09:10
0

Strictly speaking, there are a number of properties that must be present for argv to be an array. Let us consider some of those:

¹/ There can be no array pointed at by a null pointer, as null pointer are guaranteed to be an address distinct from that of any object. Therefore, argv in the following code can't be an array:

#include <assert.h>
int main(int argc, char *argv[]) {
    if (argv) return main(0, 0);
    assert(argv == 0); // argv is a null pointer, not to be dereferenced
}

²/ It's invalid to assign to an array. For example, char *argv[] = { 0 }; argv++; is a constraint violation, but int main(int argc, char *argv[]) { argv++; } compiles and runs fine. Thus we must conclude from this point that argv is not an array when declared as an argument, and is instead a pointer that (might) point into an array. (This is actually the same as point 1, but coming from a different angle, as calling main with a null pointer as argv is actually reassigning argv, something we can't do to arrays).

³/ ... As the C standard says:

Another use of the sizeof operator is to compute the number of elements in an array: sizeof array / sizeof array[0]

For example:

#include <assert.h>
int main(int argc, char *argv[]) {
    size_t size = argc+1; // including the NULL
    char *control[size];
    assert(sizeof control / sizeof *control == size); // this test passes, because control is actually an array
    assert(sizeof argv   / sizeof *argv    == size); // this test fails for all values of size != 1, indicating that argv isn't an array
}

⁴/ The unary &address-of operator is defined such that when applied to an array, will yield the same value of a different type, so, for example:

#include <assert.h>
int main(int argc, char *argv[]) {
    char *control[42];
    assert((void *) control == (void *) &control); // this test passes, because control is actually an array
    assert((void *) argv    == (void *) &argv); // this test fails, indicating that argv isn't an array
}
autistic
  • 1
  • 3
  • 35
  • 80
-1

argv is an array of pointers to characters.

The following code displays the value of argv, the contents of argv and performs a memory dump on the memory pointed at by the contents of argv. Hopefully this illuminates the meaning of the indirection.

#include <stdio.h>
#include <stdarg.h>

print_memory(char * print_me)
{
    char * p;
    for (p = print_me; *p != '\0'; ++p)
    {
        printf ("%p: %c\n", p, *p);
    }

    // Print the '\0' for good measure
    printf ("%p: %c\n", p, *p);

}

int main (int argc, char ** argv) {
    int i;

    // Print argv
    printf ("argv: %p\n", argv);
    printf ("\n");

    // Print the values of argv
    for (i = 0; i < argc; ++i)
    {
        printf ("argv[%d]: %p\n", i, argv[i]);
    }
    // Print the NULL for good measure
    printf ("argv[%d]: %p\n", i, argv[i]);
    printf ("\n");

    // Print the values of the memory pointed at by argv
    for (i = 0; i < argc; ++i)
    {
        print_memory(argv[i]);
    }

    return 0;
}

Sample Run:

$ ./a.out Hello World!
argv: ffbfefd4

argv[0]: ffbff12c
argv[1]: ffbff134
argv[2]: ffbff13a
argv[3]: 0

ffbff12c: .
ffbff12d: /
ffbff12e: a
ffbff12f: .
ffbff130: o
ffbff131: u
ffbff132: t
ffbff133:
ffbff134: H
ffbff135: e
ffbff136: l
ffbff137: l
ffbff138: o
ffbff139:
ffbff13a: W
ffbff13b: o
ffbff13c: r
ffbff13d: l
ffbff13e: d
ffbff13f: !
ffbff140:

$

You have this big contiguous array ranging from ffbff12c to ffbff140 which contains the command line arguments (this is not guaranteed to be a contiguous by the standard, but is how it's generally done). argv just contains pointers into that array so you know where to look for the words.

argv is a pointer... to pointers... to characters

QuestionC
  • 10,006
  • 4
  • 26
  • 44
  • It's not in the C or POSIX standard, but it may be guaranteed to be contiguous by the System V ABI standard. – Random832 Aug 23 '16 at 16:30
  • Page 34 of https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf describes the details of what is and is not required for that array: "Argument strings, environment strings, and the auxiliary information appear in no specific order within the information block and they need not be compactly allocated." [but the "information block" itself is a well-defined area at the top of memory that is defined to contain all the strings]. Obviously this is only relevant to systems that this standard applies to. – Random832 Aug 23 '16 at 16:37