0

Right now this code doesn't remove inline comments, how do I change it so it also removes inline comments?

FILE *output;
output = fopen("preprocess_output.c", "w");

while (fgets(line, LINE_LENGTH, file) != NULL)
{
    for (int i = 0; i < strlen(line); i++)
    {
        if (line[i]  ==  '/' && line[i + 1]  ==  '/')
        {
            comment_lines++;
        }
        else
        {
            fprintf(output, line);
        }
        if (line[i] != '\n' && line[i] != '\t')
        {
            non_blank++;
            break;
        }   
    }
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
S.Gammon
  • 9
  • 1
  • 4
    How would you do it manually? – Yunnosch Mar 03 '19 at 10:47
  • Which kind of inline comments would you want removed? Those starting and ending on the same line? – Yunnosch Mar 03 '19 at 10:49
  • 1
    Think about how you, yourself, identify comments and teach it to the computer. – machine_1 Mar 03 '19 at 10:49
  • not sure what you mean – S.Gammon Mar 03 '19 at 10:49
  • 2
    Look at some piece of code with comments you want removed. Underline them with a pen. Try to realise how you recognised them. Describe it - to yourself then to us in your question. That would be one step towards "demonstrating own research effort", which you are supposed to do. – Yunnosch Mar 03 '19 at 10:50
  • so right now im writing to a new file and im supposed to remove all comments in that new file, but it doesn't remove the comments that are on the same line as other code – S.Gammon Mar 03 '19 at 10:50
  • Please give examples. I think we are actually thinking of different thinkgs when reading "inline comments". Please show them. Define them. Show the desired result. – Yunnosch Mar 03 '19 at 10:51
  • 1
    You should loop on chars, not on lines. Just check if "//" then skip until '\n', if "/*" then skip until "*/" – caxapexac Mar 03 '19 at 10:52
  • To phrase it differently: Please answer the following question in a way that it cannot be misunderstood (and I will intentionally try...): What is an inline comment? – Yunnosch Mar 03 '19 at 10:52
  • int add(int a, int b) { return a + b; // An inline comment. <------ it does not remove this comment } – S.Gammon Mar 03 '19 at 10:54
  • What is wrong about this approach "Look at the first two letters in a line. If they are not // print the line. If the line is not empty go to next line." ? Does anything about that sound inappropriate? – Yunnosch Mar 03 '19 at 10:56
  • i'm just not very good at C, started it a couple months ago so not everything makes sense to me – S.Gammon Mar 03 '19 at 10:57
  • 1
    In that case: https://ericlippert.com/2014/03/21/find-a-simpler-problem/ or https://ericlippert.com/2014/03/05/how-to-debug-small-programs/ – Yunnosch Mar 03 '19 at 10:58
  • You need to truncate the line at the point where you find the // by placing a `'/0'` there. This raises another thing: `fgets` leaves a newline at the end of each line which must be removed. – Weather Vane Mar 03 '19 at 10:59
  • Read my question about the approach. It is English, not C. Do you spot anything weird about that? Look at a line with an inline comment, read my English approach. What is wrong? – Yunnosch Mar 03 '19 at 10:59
  • the thing is i know what i done wrong as I'm checking if a line begins with / followed by a / then it removes the line, im just not sure how to change it so it also removes comments which don't start at the beginning of a line – S.Gammon Mar 03 '19 at 11:03
  • I told you how to do that in my previous comment. – Weather Vane Mar 03 '19 at 11:16
  • Cleanest approach is to make a StateMachine (FSA/DFA). It only needs to recognise `*` `/`and`\n`. Maybe later also `"`,`'`and backslash. Maintaining state across lines would allow you to recognise multiline `/*... */` comments as well. – wildplasser Mar 03 '19 at 12:22
  • See comments in https://stackoverflow.com/questions/50500652/c-program-to-remove-comments-from-file which provide some idea as to how complex it is to properly remove comments. – Richard Chambers Mar 03 '19 at 13:19
  • The preprocessor removes comments, among other things. Not sure why you want to remove comments, but you could generate the intermediate source file without comments by using the `-E` (or `/E`) option on your compiler. – cdarke Mar 03 '19 at 14:14
  • @S.Gammon: you can accept one of the answers by clicking on the grey checkmark below its score. – chqrlie Apr 13 '19 at 12:03

2 Answers2

0

Here is a small program that strips C comments in almost all cases.

/* strip C comments by chqrlie */

#include <errno.h>
#include <stdio.h>
#include <string.h>

/* read the next byte from the C source file, handing escaped newlines */
int getcpp(FILE *fp, int *lineno_p) {
    int ch;
    while ((ch = getc(fp)) == '\\') {
        if ((ch = getc(fp)) != '\n') {
            ungetc(ch, fp);
            return '\\';
        }
        *lineno_p += 1;
    }
    if (ch == '\n')
        *lineno_p += 1;
    return ch;
}

int main(int argc, char *argv[]) {
    FILE *fp = stdin, *ft = stdout;
    const char *filename = "<stdin>";
    int ch, lineno;

    if (argc > 1) {
        if ((fp = fopen(filename = argv[1], "r")) == NULL) {
            fprintf(stderr, "Cannot open input file %s: %s\n",
                    filename, strerror(errno));
            return 1;
        }
    }
    if (argc > 2) {
        if ((ft = fopen(argv[2], "w")) == NULL) {
            fprintf(stderr, "Cannot open output file %s: %s\n",
                    argv[2], strerror(errno));
            return 1;
        }
    }
    lineno = 1;
    while ((ch = getcpp(fp, &lineno)) != EOF) {
        int startline = lineno;
        if (ch == '/') {
            if ((ch = getcpp(fp, &lineno)) == '/') {
                /* single-line comment */
                while ((ch = getcpp(fp, &lineno)) != EOF && ch != '\n')
                    continue;
                if (ch == EOF) {
                    fprintf(stderr, "%s:%d: unterminated single line comment\n",
                            filename, startline);
                    break;
                }
                putc('\n', ft);  /* replace comment with newline */
                continue;
            }
            if (ch == '*') {
                /* multi-line comment */
                int lastc = 0;
                while ((ch = getcpp(fp, &lineno)) != EOF) {
                    if (ch == '/' && lastc == '*') {
                        break;
                    }
                    lastc = ch;
                }
                if (ch == EOF) {
                    fprintf(stderr, "%s:%d: unterminated comment\n",
                            filename, startline);
                    break;
                }
                putc(' ', ft);  /* replace comment with single space */
                continue;
            }
            putc('/', ft);
            /* keep parsing to handle n/"a//"[i] */
        }
        if (ch == '\'' || ch == '"') {
            int sep = ch;
            const char *const_type = (ch == '"') ? "string" : "character";

            putc(sep, ft);
            while ((ch = getcpp(fp, &lineno)) != EOF) {
                putc(ch, ft);
                if (ch == sep)
                    break;;
                if (ch == '\\') {
                    if ((ch = getcpp(fp, &lineno)) == EOF)
                        break;
                    putc(ch, ft);
                }
                if (ch == '\n') {
                    fprintf(stderr, "%s:%d: unescaped newline in %s constant\n",
                            filename, lineno - 1, const_type);
                    /* This is a syntax error but keep going as if constant was terminated */
                    break;
                }
            }
            if (ch == EOF) {
                fprintf(stderr, "%s:%d: unterminated %s constant\n",
                        filename, startline, const_type);
                break;
            }
            continue;
        }
        putc(ch, ft);
    }
    if (fp != stdin)
        fclose(fp);
    if (ft != stdout)
        fclose(ft);
    return 0;
}

Since you get a full answer for free, try and learn how the above code handles strings and escaped newlines. There are still some corner cases that are not supported, can you find them?

  • one such corner case is the code does not parse trigraphs, an obsolescent feature that may be used to hide \ characters.
chqrlie
  • 131,814
  • 10
  • 121
  • 189
-1

in the following solution, there is a single pass over the line. If a comment was found (//), we terminate and print it. supporting (/* */) requires more work.

while (fgets(line, LINE_LENGTH, file) != NULL)
{
    size_t len = strlen(line);
    size_t i;

    for (i=0; i<len; i++)
    {
        if (line[i]=='/' && line[i + 1]=='/')
        {
            line[i] = '\0';
            break;
        }
    }
    fprintf(output, "%s", line);
}

note to two points in addition to the logic:

  • when printing using printf, always use a format string. If the line contains % it might do unexpected things.

  • do not put strlen in the condition of a loop. It generates a lot of unnecessary loops to calculate the length.

eyalm
  • 3,366
  • 19
  • 21
  • i appreciate your help but this code that you suggested removes the whole line, along with the code that's on the same line – S.Gammon Mar 03 '19 at 11:21
  • 1
    `*line = '\0';` ==> `line[i] = '\0';`. This also removes any trailing newline so you should remove the trailing newline from *every* line anyway. – Weather Vane Mar 03 '19 at 11:22
  • I'm afraid your code is incorrect: you must replace the `//` with `\n\0` for the line to have a newline and avoid concatenation with the next line. – chqrlie Mar 03 '19 at 11:25
  • It is also advisable to use type `size_t` for `i` and `len`. – chqrlie Mar 03 '19 at 11:27
  • 1
    Finally, how do you handle overlong lines that are read in chunks by `fgets()`? The OP does not handle these either, but you should at least hint at the potential problem. It is easier to handle this problem with a loop that reads one byte at a time. – chqrlie Mar 03 '19 at 11:28