1

I have to write a program that counts the number of times an operator that returns the address of a variable (&) is found inside a file.

I use this simple loop to do so (do not mind the !feof(p) that raises some questions):

while (!feof(p)){   
c = fgetc(p);
if (c=='&') n++; }

However, this does not satisfy my needs. For instance, if an AND operator (&&) is found, my loop will increase the variable "n" twice but it mustn't even once. Another thing is that if the & operator is found in the scope of a single or multi-line comment it should not be counted.

My question is how can I make sure the given character/string (in my case "&" operator) is in a comment or not? And how to make sure it is indeed a "&" operator and not a part of a "&&" or a string?

  • 3
    [*Why is while (!feof(file)) always wrong?*](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Oliver Charlesworth Dec 29 '17 at 15:43
  • 2
    More generally, you are going to need a real parser here. (Assuming you're parsing C, another case you'll need to handle is the binary `&` operator. Or an `&` in a string literal.) – Oliver Charlesworth Dec 29 '17 at 15:45
  • Possible duplicate of [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Dmitry Dec 29 '17 at 15:45
  • @Dmitry - that's not the proximate issue here, though. – Oliver Charlesworth Dec 29 '17 at 15:51
  • You might be able to get away with a state machine. Otherwise, I think Oliver is right, you'll need a real parser. Maybe piggy-back off a compiler like Clang. – Christian Gibbons Dec 29 '17 at 15:57
  • @OliverCharlesworth thanks for the immediate answer but I did not really catch that. Can you provide more details? – Miroslav Bozhkov Dec 29 '17 at 16:03
  • There are many different contexts in which the `&` character can appear in C code. You need some kind of parser to distinguish between these contexts (note that a parser is a kind of state machine, so ChristianGibbon's suggestion is also correct). – Oliver Charlesworth Dec 29 '17 at 16:05
  • A possible solution is that you store the last character and if it's `&` you decrease the counter, so the `&&` is not counted. As for single line comments you can set a flag once you read two `/` in a row and then not count if the flag is set. You unset the flag when you reach a new line. As for multi-line comments you do the same thing but set the flag at `/*` and unset it at `*/`. – Kys Plox Dec 29 '17 at 16:07
  • @KysPlox Not to mention all other cases, like `char c='&'` or `int mask = 0x1 & val`. – klutt Dec 29 '17 at 16:11
  • @KysPlox - And string literals, and character literals, and bitwise-and operator, ... The framework for building the required state machine is ultimately a lexer + parser. – Oliver Charlesworth Dec 29 '17 at 16:11
  • To do what you want by hand you will need a full parser. C is a not a context free language. You basically ask "how to do a parser for C", this is too broad for a question in stackoverflow ;). – Stargateur Dec 29 '17 at 16:12
  • Thanks again. I think that the parser thing is above my skills. The assistant professor at university suggested your way @KysPlox but I gave it a few tries and can't seem to make it work. – Miroslav Bozhkov Dec 29 '17 at 16:12
  • Well I posted an example code of that below as an answer – Kys Plox Dec 29 '17 at 16:22
  • @MiroslavBozhkov I posted a decent version now. – klutt Dec 29 '17 at 18:16

2 Answers2

0

As been mentioned in the comments, this is not a trivial task that could be written with a few lines of code. What you need is a parser. That parser needs to handle many different cases. Here is a (probably non-exhaustive) list:

  • One line comments: // This is a comment
  • Multiline comments: /* This is a comment */
  • Characters: char c='&'
  • String literals: strcmp(str, "A string with a & in it")
  • The bitwise operator: int a = mask & b

You would also need to decide how to handle incorrect input. Should the program be able to detect incorrect c code, or should it assume all input is correct? Another thing to consider is how to handle #include. Do you want to count the number of occurrences in the included files too? (I assume not, but this demonstrates a problem)

If you want it to 100% accurate in finding only the address operator, then it is way above your knowledge. (OP wrote "This is a problem is designed to be solved by 1st-semester students with only basic knowledge." in comment below)

If you're allowed to cut some corners there are easier ways.

Here is a complete example that cut some corners. It handles comments and strings, including escaped characters. However, it does not handle the bitwise operator.

#include <stdio.h>
#include <stdlib.h>

#define INPUT "input.c"

int main()
{
    FILE *f;

    if ((f = fopen(INPUT, "r")) == NULL)
    {
        perror (INPUT);
        return (EXIT_FAILURE);
    }

    char c, p=0;
    int n=0;

    while((c = fgetc(f)) != EOF)
    {
        if(c == '/' && p == '/') {
            while((c = fgetc(f)) != EOF) {
    // If we read // then we throw away the rest of the line
                if( c == '\n' ) {
                    break;
                }
            }
            if( c == EOF) {
                goto end;
            }
        }

        else if(c == '*' && p == '/') {
    // If we read /* then we throw away everything until we have read */
            while((c = getc(f)) != EOF) {
                if( c == '*' ) {
                    if((c = getc(f)) != EOF)
                        if( c == '/')
                            break;
                }
            } if ( c == EOF) {
                goto end;
            }
        }

        else if(c == '"') {
    // Read until end of string
            while((c = getc(f)) != EOF) {
                if(c == '\\') {
                    if((c = getc(f)) == EOF)
                       goto end;
                }
                else if(c == '"')
                    break;
            }
        }

        else if(c == '\'') {
            while((c = getc(f)) != EOF) {
                if(c == '\\') {
                    if((c = getc(f)) == EOF)
                       goto end;
                }
                else if(c == '\'')
                    break;
            } if ( c == EOF)
                  goto end;
        }

        else if(c == '&') {
            printf("hej");
            if(p == '&')
                n--;
            else
                n++;
        }

        p=c;
    }
    end:
    printf("\n\nExited at pos %ld\n", ftell(f));
    printf("Number of address operators: %d\n", n);
}

It works a bit like this: When it sees a start of a comment, it reads and throws away everything until the comment is finished or EOF. It does the same for strings.

On this input:

// Test &
/* Also
   &
   test */

// "


int main()
{
    /* " //
     */
    // /*

    char str[]="hej&\"";
    char c='&';
    char k='\'';
    int a, b;
    int * p;
    p=&a;
    int c=a&b;
    int q=a&&b;
}

// Test &
/* Also
   &
   test */

It reports the expected result 2. It would be better if it printed 1, but as I mentioned, it cannot handle the bitwise operator, thus counting it as an address operator. Fixing this issue would make things a lot more complicated.

And yes, I'm using goto since it is extremely convenient in a situation like this. In C++, I'd use exceptions, but that's not an option in C.

klutt
  • 30,332
  • 17
  • 55
  • 95
  • Thank you! I guess that I have asked too much but isn't there an easy way? This is a problem is designed to be solved by 1st-semester students with only basic knowledge. – Miroslav Bozhkov Dec 29 '17 at 16:22
  • @MiroslavBozhkov I updated the answer a bit to answer your comment. The task *"detect all single occurances of & that are not inside a comment"* is a fairly easy task, but as you can see it is not 100% accurate if you want to count address operators only. – klutt Dec 29 '17 at 16:31
-1

To cover all the cases in the C language would be pretty hard and you would need a proper parser probably, but if you only intend to use this for excersise - to make in work in the cases described in the question, you could implement something like this:

char previous = 0;
int single_line_comment = 0;
int multi_line_comment = 0;
int in_string = 0;
int in_char = 0;
while (!feof(p)){   
    c = fgetc(p);
    if (c == '&' && !single_line_comment && !multi_line_comment && !in_string && !in_char)
    {
        if(previous == '&')
            n--;
        else
            n++;
    }
    else if(c == '/' && prev == '/' && !multi_line_comment && !in_string && !in_char)
        single_line_comment = 1;
    else if(prev == '/' && c == '*' && !single_line_comment && !in_string && !in_char)
        multi_line_comment = 1;
    else if(c == '\n' && !multi_line_comment && !in_string && !in_char)
        single_line_comment = 0; 
    else if(prev == '*' && c == '/' && !single_line_comment && !in_string && !in_char)
        multi_line_comment = 0;
    else if(c = '"' && !single_line_comment && !multi_line_comment && !in_char)
        in_string = !in_string;
    else if(c = '\'' && !single_line_comment && !multi_line_comment && !in_string)
        in_char = !in_char;
    previous = c;
}

Of course this is not a prefect solution, but could give an idea of how to overcome some of the problems.

Kys Plox
  • 774
  • 2
  • 10
  • 23
  • Thanks! Why do people seem to dislike your method? – Miroslav Bozhkov Dec 29 '17 at 16:26
  • Maybe because it is incomplete or because I used `while (!feof(p))` which is't good but I just edited your code and fixing this is a separate problem. More information about that: https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong – Kys Plox Dec 29 '17 at 16:28
  • @MiroslavBozhkov By the way you thank someone by marking the answer as being correct (checking the checkmark next to the answer). This will give your rep a +2 boost. – Kys Plox Dec 29 '17 at 16:32
  • I downvoted because it's very far from a useful/working solution. – Oliver Charlesworth Dec 29 '17 at 16:36
  • @OliverCharlesworth Same here. For instance it doesn't even work on all the cases OP specifically stated, like strings. – klutt Dec 29 '17 at 16:38
  • Well I didn't notice that OP wanthed it to work with string as well, but anyhow that would be trivial to do in the same manner. – Kys Plox Dec 29 '17 at 16:43
  • I added being in string and char cases. Note escaping the `'` with a backslash – Kys Plox Dec 29 '17 at 16:49
  • 1
    This code would make the line `// "` to trigger `in_string` – klutt Dec 29 '17 at 16:50
  • 1
    Also, the line `// /*` would trigger `multi_line_comment` – klutt Dec 29 '17 at 16:52
  • One more thing, it cannot handle escaped `"` inside a string. Well, basically this is pretty broken, because there are a couple of more bugs too, and it's far from trivial (although not extremely difficult) to fix it. – klutt Dec 29 '17 at 16:58
  • Good point. I edited it so it works in these cases. – Kys Plox Dec 29 '17 at 16:59
  • As for escaping that would need some more logic but that wasn't requested – Kys Plox Dec 29 '17 at 17:02
  • The question states *"inside a string"* and your code does not work on all strings. – klutt Dec 29 '17 at 17:06