13

I have the following snippet of code:

int main(int argc, char *argv[])
{   

     char line[MAXLINE];
     long lineno = 0;
     int c, except = 0, number = 0, found = 0;

     while(--argc > 0 && (*++argv)[0] == '-') //These two lines
        while(c = *++argv[0])                 //These two lines
          switch(c) {
             case 'x':
                  except = 1;
                  break;
             case 'n':
                  number = 1;
                  break;
             default:
                  printf("find: illegal option %c\n", c);
                  argc = 0;
                  found = -1;
                  break;
          }

     ...
}

Containing the following expressions:

while(--argc > 0 && (*++argv)[0] == '-')

Does this expression in the parentheses (*++argv)[0] differ from while(c = *++argv[0]) without parentheses?

If so, how? Does (*++argv) mean pointer to the next argument, and does *++argv[0] mean pointer to the next character in the current char array which is being pointed to?

Tool
  • 12,126
  • 15
  • 70
  • 120
  • Im also interested about one thing: while(c = *++argv[0]) this expression. Does this actually means: while(c = *++argv[0] != 0), i mean does *++argv[0] returns a null pointer to c if it hasnt found a character? – Tool Jan 07 '10 at 14:38
  • As noted in my answer, see K&R's errata entry on this code: http://cm.bell-labs.com/cm/cs/cbook/2ediffs.html – Alok Singhal Jan 07 '10 at 15:27

5 Answers5

40

First, K&R have an errata on this particular snippet:

117(§5.10): In the find example, the program increments argv[0]. This is not specifically forbidden, but not specifically allowed either.

Now for the explanation.

Let's say your program is named prog, and you execute it with: prog -ab -c Hello World. You want to be able to parse the arguments to say that options a, b and c were specified, and Hello and World are the non-option arguments.

argv is of type char **—remember that an array parameter in a function is the same as a pointer. At program invocation, things look like this:

                 +---+         +---+---+---+---+---+
 argv ---------->| 0 |-------->| p | r | o | g | 0 |
                 +---+         +---+---+---+---+---+
                 | 1 |-------->| - | a | b | 0 |
                 +---+         +---+---+---+---+
                 | 2 |-------->| - | c | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 3 |-------->| H | e | l | l | o | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 4 |-------->| W | o | r | l | d | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 5 |-------->NULL
                 +---+

Here, argc is 5, and argv[argc] is NULL. At the beginning, argv[0] is a char * containing the string "prog".

In (*++argv)[0], because of the parentheses, argv is incremented first, and then dereferenced. The effect of the increment is to move that argv ----------> arrow "one block down", to point to the 1. The effect of dereferencing is to get a pointer to the first commandline argument, -ab. Finally, we take the first character ([0] in (*++argv)[0]) of this string, and test it to see if it is '-', because that denotes the start of an option.

For the second construct, we actually want to walk down the string pointed to by the current argv[0] pointer. So, we need to treat argv[0] as a pointer, ignore its first character (that is '-' as we just tested), and look at the other characters:

++(argv[0]) will increment argv[0], to get a pointer to the first non- - character, and dereferencing it will give us the value of that character. So we get *++(argv[0]). But since in C, [] binds more tightly than ++, we can actually get rid of the parentheses and get our expression as *++argv[0]. We want to continue processing this character until it's 0 (the last character box in each of the rows in the above picture).

The expression

c = *++argv[0]

assigns to c the value of the current option, and has the value c. while(c) is a shorthand for while(c != 0), so the while(c = *++argv[0]) line is basically assigning the value of the current option to c and testing it to see if we have reached the end of the current command-line argument.

At the end of this loop, argv will point to the first non-option argument:

                 +---+         +---+---+---+---+---+
                 | 0 |-------->| p | r | o | g | 0 |
                 +---+         +---+---+---+---+---+
                 | 1 |-------->| - | a | b | 0 |
                 +---+         +---+---+---+---+
                 | 2 |-------->| - | c | 0 |
                 +---+         +---+---+---+---+---+---+
 argv ---------->| 3 |-------->| H | e | l | l | o | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 4 |-------->| W | o | r | l | d | 0 |
                 +---+         +---+---+---+---+---+---+
                 | 5 |-------->NULL
                 +---+

Does this help?

Alok Singhal
  • 93,253
  • 21
  • 125
  • 158
  • @Alok could you explain this step:: if((strstr(line, *argv) != NULL) != except) ? – HELP PLZ Jul 04 '14 at 22:30
  • @AbhimanyuAryan `strstr(a, b)` checks if the string `b` exists in `a`. It returns `NULL` if `b` is not in `a`. So, `strstr(line, *argv) != NULL` checks if the string pointed to by `argv` is in `line`, and has the value `1` if it is, and `0` if it isn't. `except` was set to `1` or `0` earlier based on the presence of `x` flag. – Alok Singhal Aug 01 '14 at 18:35
  • Thanks a lot for the careful step-by-step explanation. I had been pulling my hair over this for the last one hour. Now it's as clear as a crystal. Thanks. – Aniruddha Jun 16 '22 at 18:11
5

yes, you are correct.

while(--argc > 0 && (*++argv)[0] == '-')

is scanning the array (of length argc) of command line arguments one by one looking for those starting with a - option prefix. For each of those:

while(c = *++argv[0])

is scanning through the set of switch characters that follow the first - in the current argument (i.e. t and n in -tn, until it hits the string null terminator \0, which terminates the while loop, since it evaluates as false.

This design allows both

myApp -t -n

and

myApp -tn

to both work and be understood as having the options t and n.

Alex Brown
  • 41,819
  • 10
  • 94
  • 108
  • 1
    This design is simple and mostly reasonable, apart from the fact it modifies argc, and the contents of the array argv, which is poor design since it prevents any further use of these variables. – Alex Brown Jan 07 '10 at 15:00
5

Incrementing argv is a very bad idea, as once you have done so it is difficult to get the original value back. It is simpler, clearer and better to use an integer index - after all argv IS an array!

To answer your question ++argv increments the pointer. This then has indirection applied to it to get the first character.

  • Actually, the indirection starts with the first character after the -, and each cycle it moves onto the next, to support clusters of option flags after a single - character. – Alex Brown Jan 07 '10 at 14:51
  • I was referring to (*++argv)[0] == '-' –  Jan 07 '10 at 14:54
4

The parentheses change the order in which the expressions are evaluated.

Without parentheses *++argv[0]:

  1. argv[0] gets the pointer to character data currently pointed to by argv.
  2. ++ increments that pointer to the next character in the character array.
  3. * gets the character.

with parentheses (*++argv)[0]:

  1. ++argv increments the argv pointer to point to the next argument.
  2. * defereferences it to obtain a pointer to the character data.
  3. [0] gets the first character in the character array.
Dave Cluderay
  • 7,268
  • 1
  • 29
  • 28
2

Yes, the two expressions differ (though only slightly). IMO, this code is a bit on the excessively clever side. You'd be better off with something like this:

for (int i=1; i<argc; i++)
    if (argv[i][0] == '-') {
       size_t len = strlen(argv[i]);
       for (int j=0; j<len; ++j)
           switch(argv[i][j]) {
               case 'x':
               // ...

This is pretty much equivalent to the code above, but I doubt anybody (who knows C at all) would have any difficulty figuring out what it really does.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • but this code would not detect chains of options - you need another iterator to walk the chain of options -tn – Alex Brown Jan 07 '10 at 14:41
  • @Alex Brown: I believe I've fixed that -- though I'm not sure it's necessarily any real improvement. Allowing `-tn` instead of `-t -n` would have meant a fair amount when a typical terminal was a Teletype, but it's hardly worthwhile anymore. – Jerry Coffin Jan 07 '10 at 14:55
  • @Jerry It's entirely worthwhile. Every command-line user expects to be able to provide single-letter options in a group, and being lazy about the code means violating those strongly held expectations. The deeper issue here is the custom-coding of this functionality, rather than the use of getopt or similar. – Phil Miller Jan 07 '10 at 15:05
  • @Novelocrat: I'm afraid I can't really agree -- quite a few command line tools either don't allow clustered arguments at all, or have specific limitations about what arguments can and can't be clustered. Nobody with any substantial amount of experience can honestly have much expectation about this subject. Given that it only supports two arguments, neither with any associated parameter, I can see where using getopt would probably make the code more complex, so I can understand not using it, even though I agree that it *probably* should anyway. – Jerry Coffin Jan 07 '10 at 17:10
  • 1
    you try taking tar -xvzf a.tar.gz away from me and see what happens. or ls -laTr, or ps -elF, etc, etc. – Alex Brown Jan 07 '10 at 21:20