4

I found this function and I don't understand certain parts of it. I have about 3 days worth of C experience, so bear with me. This function serves a purpose of parsing command-line arguments.

  1. Why do they reassign *arg to *c ?

  2. I don't understand why they are running a while loop.

  3. Secondly, why would they run a while loop against a char pointer? I understand that a char is actually an array of characters, but my understanding is that they would only run a while loop against a char is to access the array character values individually, and they don't do any of that.

  4. How can you increment against a char?

  5. Why do we even have *c?

  6. I added the string check to see if the arg is -styles for example, which has a - so I can parse the flag and obtain the value, which is the next arg in argv — is that correctly used?

Like I said, I've got about 3 days of C experience, so please be thorough and methodical and as helpful as possible as to help me better understand this function and C overall.

void print_args(int argc,char *argv[])
{
     int i;
     if(argc > 1){
       for(i=1;i<argc;i++){
           char *arg = argv[i];
           char *c = arg;
           while(*c){
             if(strchr("-", *c)){
                 printf("arg %d: %s -> %s\n",i,arg,argv[i+1]);
             }  
             c++;
         }
       }
     }
 }
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
somejkuser
  • 8,856
  • 20
  • 64
  • 130
  • 1) `if(argc > 1){` :this condition is superfluous. The loop condtion will take care (because it is basically the same). 2) `while(*c){}`: transform this into a `for(c=arg;*c;c++)` loop and you won't need the c++ anymore. And it will save you two lines! – wildplasser Nov 25 '12 at 01:08

4 Answers4

2

1) arg is assigned to point to the head of the current char array, and c is used to traverse the array.

2,3,5) The while loop is run until c points to (char)0, which incidently is also \0. So in effect they go over every character in the char array, until they reach the null terminator symbol. A better conditional would have been while (*c != '\0'), rather than relying implicitly that '\0' == 0

4) They increment a pointer, Thus having it point to the next memory cell, i.e. the next array cell.

6) Your addition will work as a test to recognise an option, you'd still have to compare it against -styles to see that it is indeed a valid option.

This code sample could do with a lot of fixing up to make it more robust and clearer. If this is from a book or C tutorial, I suggest you look for another.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
  • Au contraire: `while (*c)` is idiomatic, `while (*c != '\0')` is not. Writing it out the longer way is a sign of someone who isn't fully fluent in C yet. (It is less wrong to do this for a loop over characters than if you're checking whether a pointer is NULL.) – zwol Nov 24 '12 at 23:59
  • @Zack writing it out the shorter way is a sign of someone who doesn't want their code to be readable at first glance. – StoryTeller - Unslander Monica Nov 25 '12 at 00:01
  • please also see my addition - #6 – somejkuser Nov 25 '12 at 00:01
  • @jkushner, the conditional for one. As Zack pointed out, it may be more idiomatic, but it conveys intent poorly. Plus it makes an implicit assumtion about the default encoding. `\0` is numerically 0, but it doesn't always have to be. Other than that, the placing of `}` could be better, I personally prefer them aligned with their opening statement. And some comments, for heaven's sake. – StoryTeller - Unslander Monica Nov 25 '12 at 00:11
  • So the while loop is to just ensure that c is a valid char with character values besides the null character, essentially? Yeh, comments should have been in there. – somejkuser Nov 25 '12 at 00:14
  • @jkushner, the while loop doesn't ensure the char array ends with a null terminator, it **assumes and relies on it**. It traverses the array (and the commands in it compare every char in the array to '-'). – StoryTeller - Unslander Monica Nov 25 '12 at 00:18
  • Right. I misworded myself. It runs until it gets to the NULL pointer. So if the arg is NULL, it wont enter the while loop. Correct? – somejkuser Nov 25 '12 at 00:19
  • If `arg`(and therefore `c`) is NULL than `*c` will cause a seg fault. If arg points to an empty string (i.e. the first cell contains '\0') than it won't enter the while loop. – StoryTeller - Unslander Monica Nov 25 '12 at 00:22
  • @DimaRudnik When I say `if (*c)` is idiomatic I mean *that is the construct that is most readable at a glance for someone fully fluent in the language*. Someone fully fluent in the language who sees `if (*c != '\0')` will have a mental hiccup as they wonder why it wasn't written `if (*c)`. – zwol Nov 25 '12 at 00:26
  • @Zack, let's agree to disagree. – StoryTeller - Unslander Monica Nov 25 '12 at 00:29
  • @DimaRudnik Also, you're mistaken about character encodings: `'\0' == 0` is guaranteed to be true by the C standard. (You might be thinking by analogy with `'\n' == 10` which is *not* guaranteed, and is in fact not true on EBCDIC systems. But octal and hexadecimal character escapes always have their face value.) – zwol Nov 25 '12 at 00:29
  • @DimaRudnik Can you further explain the seg fault issue? Are you referring to where the char *c is initialized as the value of arg? How do I properly handle this to ensure no seg fault exists? – somejkuser Nov 25 '12 at 00:31
  • @DimaRudnik, This is not a matter of opinion. I have plenty of *opinions* about the correct way to write C which are not shared by all fully-fluent C programmers (for instance, the correct placement of opening curly braces) and I bite my tongue and conform to house style on those. But it is an *empirical fact* that skilled C programmers will understand `if (*c)` faster than `if (*c == '\0')`. People have done experiments. – zwol Nov 25 '12 at 00:32
  • @jkushner, i mean that if `c` is NULL, than the dereference will cause an error. You can guard against it by writing `c != NULL && *c != '\0'`. Or the _more idiomatic_ `c && *c`. – StoryTeller - Unslander Monica Nov 25 '12 at 00:35
  • @jkushner You don't have to worry about a segfault in this code *as long as* `print_args` is only ever invoked with the real `argc` and `argv` passed to `main`; the standard guarantees that for all values of `i` from 0 to `argc-1`, `argv[i]` will not be a null pointer. (`argc[argv]`, however, is *required* to be a null pointer.) – zwol Nov 25 '12 at 00:36
  • addendum: if, on your system, `printf("%s", (char *)0)` segfaults, then so will this code. I do not remember off the top of my head whether the standard requires that to not segfault, and I don't have my copy of the standard on this computer. – zwol Nov 25 '12 at 00:38
  • @DimaRudnik I'm interested in learning more about handling seg faults. Is this a good resource to what you are talking about? http://stackoverflow.com/questions/554138/catching-segfaults-in-c – somejkuser Nov 25 '12 at 00:43
  • @jkushner It is good source, but I was only correcting your terminology. It seemed to me you were confusing a NULL pointer with a null terminaotr. For most programs (especially ones a beginner will write) it's better to solve a segfault by guarding against it, rather than writing a custom signal handler. – StoryTeller - Unslander Monica Nov 25 '12 at 00:50
  • Ah. I didn't realize theres a null terminator and a null pointer. – somejkuser Nov 25 '12 at 00:51
1

Most of the individual things that code does could make sense alone, but that while loop over the characters in arg is bizarre. It's something like "print arg and next arg for every dash in arg". But I doubt anyone really wants that.

Ben Jackson
  • 90,079
  • 9
  • 98
  • 150
1

I turned print_args into the main function of a program that contains nothing else and ran it. This is what it does:

$ ./a.out a ab abc abcd
arg 1: a -> ab
arg 2: ab -> abc
arg 2: ab -> abc
arg 3: abc -> abcd
arg 3: abc -> abcd
arg 3: abc -> abcd
arg 4: abcd -> (null)
arg 4: abcd -> (null)
arg 4: abcd -> (null)
arg 4: abcd -> (null)

In other words, for each character of each command line argument, it prints that entire command line argument, a thin arrow, and the next command line argument. (If my C library wasn't nice about printf("%s", (char *)NULL), it would crash when it got to the end.) Printing the entire command line argument is why it needs both c and arg, although that line perfectly well could have read

printf("arg %d: %s -> %s\n", i, argv[i], argv[i+1]);

and then arg would be unnecessary.

Your guess is as good as mine as to why this function does this particular thing. To me it does not seem like a particularly useful thing to do.

EDIT: If your goal is to print argv[i] and argv[i+1] whenever argv[i] starts with a dash and argv[i+1] is not NULL (which is a thing that might actually make sense in context), then you should write it like so:

void
print_args(int argc, const char *const *argv)
{
    int i;
    for (i = 0; i < argc; i++)
        if (argv[i][0] == '-' && argv[i+1])
            printf("arg %d: %s -> %s\n", i, argv[i], argv[i+1]);
}

If you want to print whenever argv[i] contains a dash, then replace

argv[i][0] == '-'

with

strchr(argv[i], '-')

in the if statement.

Addendum: I strongly recommend you read http://www.gnu.org/software/libc/manual/html_node/Argument-Syntax.html which describes how modern UNIX command line programs are supposed to interpret their arguments.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Its dependant on the fact that the user will supply a -option value in that order, so it should only print when its on -option arg. for example, ./a.out -test a -test2 abc – somejkuser Nov 25 '12 at 00:10
  • The version I experimented with was the first version you posted, which did not have the additional `if (strchr("-", *c))` conditional in there. (Which is a ridiculous and slow way to write `if (*c == '-')`, by the way.) – zwol Nov 25 '12 at 00:24
  • Its not meant to confirm if C is equal to '-', but if it includes '-'. Is there a better way to handle this? – somejkuser Nov 25 '12 at 00:39
  • Thanks for the help. Just out of curiosity, how should I approach situations when I have `-someflag somevalue -someotherflag -athirdflag`. In this situation, there is no value passed for `-someotherflag`, so it uses the next key, which is a flag also. – somejkuser Nov 25 '12 at 01:25
  • 1
    @jkushner You will need to have a table somewhere of whether or not it is appropriate for each `-something` to take a value or not. When it is appropriate, increment `i` by two instead of one. When it isn't, don't. (You do, after all, need to allow option values to begin with a dash.) – zwol Nov 25 '12 at 01:40
0

When you increment a pointer, you're going to the next address after that pointer. So let's say you're at address 0000, the next one is 0004, 0008, etc. The value at that address could be, for instance, a character in the string. So if the string is "hello", then *c would be "h", and the first *c++ would be "e".

while loops are usually not the most efficient ways to do things, because you can easily get stuck in an infinite loop. It is to ensure that every character in the string has been accounted for.

I'm not totally sure why they assign arg to *c.

Jacob Morrison
  • 610
  • 2
  • 10
  • 19