3

I work in C-language at first time and have some question.

How can I get the number of lines in file?

FILE *in;
char c;
int lines = 1;
...
while (fscanf(in,"%c",&c)  == 1) {
  if (c == '\n') {
    lines++;
  }
}

Am I right? I actually don't know how to get the moment , when string cross to the new line.

openspace
  • 163
  • 8
  • 1
    If you try your code on a few different files, does it seem to work? – Some programmer dude Mar 28 '17 at 12:07
  • 1
    Read a whole line at a time. That'll be faster. And note that some files don't have a new line character at the end – phuclv Mar 28 '17 at 12:07
  • 3
    @LưuVĩnhPhúc That requires knowledge about the length of the longest line, or checking if a full line was read or not. – Some programmer dude Mar 28 '17 at 12:09
  • @Someprogrammerdude do you mean that this code isn't appropriate for different files ? – openspace Mar 28 '17 at 12:12
  • 4
    @openspace he means have you tried it with files that have one line, two lines, 100 lines, 100000 lines, files where the *last* line does not terminate with a newline, etc., and they *all* deliver the appropriate known-correct values? In other words: have you *tested* your code? If so, it's probably good. If not, why not? – WhozCraig Mar 28 '17 at 12:14
  • @WhozCraig oh, number of lines became from 1, my bad. – openspace Mar 28 '17 at 12:15

3 Answers3

4

OP's code functions well aside from maybe an off-by-one issue and a last line issue.

Standard C library definition

A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. C11dr §7.21.2 2

A line ends with a '\n' and the last line may or may not end with a '\n'.

If using the idea that the last line of a file does not require a final '\n, then the goal is to count the number of occurrences that a character is read after a '\n'.

// Let us use a wide type
// Start at 0 as the file may be empty
unsigned long long line_count = 0;

int previous = '\n';
int ch;
while ((ch = fgetc(in)) != EOF) {
  if (previous == '\n') line_count++;
  previous = ch;
}

printf("Line count:%llu\n", line_count);

Reading a file one character at a time may be less efficient than other means, but functionally meets OP goal.

This answer uses (ch = fgetc(in)) != EOF instead of fscanf(in,"%c",&c) == 1 which is typically ""faster", but with an optimizing compiler, either may emit similar performance code. Such details of speed can be supported with analysis or profiling. When in doubt, code for clarity.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • No, fgetc will not be slower than scanf. Indeed, it will probably be faster. It won't read the stream one character at a time but in chunks of (probably) 4096 or 8192 bytes. fgetc() is the correct thing to do. – William Pursell Mar 28 '17 at 13:29
0

Can use this utility function

/*
* count the number of lines in the file called filename
*
*/
int countLines(char *filename)
{

  FILE *in = fopen(filename,"r");
  int ch=0;
  int lines=0;

  if(in == NULL){
     return 0; // return lines;
  }
  while((ch = fgetc(in)) != EOF){

     if(ch == '\n'){
        lines++;
     }
  }
    fclose(in);
  return lines;
}
kourouma_coder
  • 1,078
  • 2
  • 13
  • 24
  • 3
    This is broken, never call `feof()` like that. See [Why is “while ( !feof (file) )” always wrong?](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong). – unwind Mar 28 '17 at 12:20
  • @lazy_coder. It is not enough to check for `'\n'` character. Depending on operating system, it can be `'\r'`, `'\n'`, or `"\r\n"`. – user7771338 Mar 28 '17 at 12:30
  • 1
    @FREE_AND_OPEN_SOURCE This answer opens the file in text mode `fopen(filename,"r");` and the file's line ending, be it the anticipated `"\r"`, `"\n"`, or `"\r\n"`, is translated to `'\n'` by the C library. Checking `'\n'` is sufficient. If code needs to handle text files from other sources, then many more issues apply and are beyond this post's scope. Given that these other line endings do occur, still, is a good consideration for robust code. – chux - Reinstate Monica Mar 28 '17 at 14:48
0

When counting the 'number of \n characters', you have to remember that you are counting the separators, and not the items. See 'Fencepost Error'

Your example should work, but:

  • if the file does not end with a \n, then you might be off-by-one (depending on your definition of 'a line').
  • depending on your definition of 'a line' you may be missing \r characters in the file (typically used by Macs)
  • it will not be very efficient or quick (calling scanf() is expensive)

The example below will ingest a buffer each time, looking for \r and \n characters. There is some logic to latch these characters, so that the following line endings should be handled correctly:

  • \n
  • \r
  • \r\n
#include <stdio.h>
#include <errno.h>

int main(void) {
    FILE *in;
    char buf[4096];
    int buf_len, buf_pos;
    int line_count, line_pos;
    int ignore_cr, ignore_lf;

    in = fopen("my_file.txt", "rb");
    if (in == NULL) {
        perror("fopen()");
        return 1;
    }

    line_count = 0;
    line_pos = 0;
    ignore_cr = 0;
    ignore_lf = 0;

    /* ingest a buffer at a time */
    while ((buf_len = fread(&buf, 1, sizeof(buf), in)) != 0) {

        /* walk through the buffer, looking for newlines */
        for (buf_pos = 0; buf_pos < buf_len; buf_pos++) {

            /* look for '\n' ... */
            if (buf[buf_pos] == '\n') {
                /* ... unless we've already seen '\r' */
                if (!ignore_lf) {
                    line_count += 1;
                    line_pos = 0;
                    ignore_cr = 1;
                }

            /* look for '\r' ... */
            } else if (buf[buf_pos] == '\r') {
                /* ... unless we've already seen '\n' */
                if (!ignore_cr) {
                    line_count += 1;
                    line_pos = 0;
                    ignore_lf = 1;
                }

            /* on any other character, count the characters per line */
            } else {
                line_pos += 1;
                ignore_lf = 0;
                ignore_cr = 0;
            }
        }
    }

    if (line_pos > 0) {
        line_count += 1;
    }

    fclose(in);

    printf("lines: %d\n", line_count);

    return 0;
}
Attie
  • 6,690
  • 2
  • 24
  • 34
  • 2
    Please see [Why is “while ( !feof (file) )” always wrong?](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong). – unwind Mar 28 '17 at 12:21
  • Thanks @unwind ... I reverted to my original approach – Attie Mar 28 '17 at 12:29
  • @Attie. It is not enough to check for `'\n'` character. Depending on operating system, it can be `'\r'`, `'\n'`, or `"\r\n"`. – user7771338 Mar 28 '17 at 12:30
  • @FREE_AND_OPEN_SOURCE-- it is sufficient to check for `\n` since the file is opened in text mode; in text mode line-terminators are converted to a new-line character. From the C11 Draft Standard: ["there need not be a one- to-one correspondence between the characters in a stream and those in the external representation"](http://port70.net/~nsz/c/c11/n1570.html#7.21.2p2). – ad absurdum Mar 28 '17 at 13:53
  • 1
    1) Looking for the 3 common lines ending is _good_, IMO, given that "text" files from time-to-time _originate_ on variant platforms. Yet above code does get fooled by such files with a mixture of `\r\n` and `\n`, a too often occurrence in my experience. 2) `fread()` of a _text_ stream is not nearly so valuable a speed improvement as a _binary_ stream. Like @David, recommend opening in binary mode. – chux - Reinstate Monica Mar 28 '17 at 14:16
  • 1
    @DavidBowling Agree looking for `'\n'` is sufficient when the file originates from like code. Yet only checking for `'\n'` is not sufficient when the text file comes from another platform employing a different, unanticipated per the compiler, line ending encoding. IMO, code should open the file in text mode and let the C library handle the underlying variant line endings or open in binary mode and take over the job itself. – chux - Reinstate Monica Mar 28 '17 at 14:23
  • 1
    @DavidBowling Yes to the latter point (of now deleted comment). Counting `'\n'` is likely today to be sufficient (aside from [last line issues](http://stackoverflow.com/a/43070760/2410359)), yet a text file using only `'\r'` may not be converted to `'\n'` on many files opened in text mode these days - its one _long_ line. I come across these file mostly from captured serial communication logs and rarely from files originating on old MACs and the like. – chux - Reinstate Monica Mar 28 '17 at 14:34
  • 1
    @chux-- yes, agreed. I deleted my comment, but I was mistakenly thinking that common line endings were translated; this was clearly silly, and would be at best implementation-dependent. To handle files from divergent platforms, binary mode is needed. – ad absurdum Mar 28 '17 at 14:38