Which is the fastest way to get the lines of an ASCII file?
-
in .txt for example, basically i need the newlines – Sunscreen Nov 25 '10 at 16:02
5 Answers
Normally you read files in C using fgets
. You can also use scanf("%[^\n]")
, but quite a few people reading the code are likely to find that confusing and foreign.
Edit: on the other hand, if you really do just want to count lines, a slightly modified version of the scanf
approach can work quite nicely:
while (EOF != (scanf("%*[^\n]"), scanf("%*c")))
++lines;
The advantage of this is that with the '*' in each conversion, scanf
reads and matches the input, but does nothing with the result. That means we don't have to waste memory on a large buffer to hold the content of a line that we don't care about (and still take a chance of getting a line that's even larger than that, so our count ends up wrong unless we got to even more work to figure out whether the input we read ended with a newline).
Unfortunately, we do have to break up the scanf
into two pieces like this. scanf
stops scanning when a conversion fails, and if the input contains a blank line (two consecutive newlines) we expect the first conversion to fail. Even if that fails, however, we want the second conversion to happen, to read the next newline and move on to the next line. Therefore, we attempt the first conversion to "eat" the content of the line, and then do the %c
conversion to read the newline (the part we really care about). We continue doing both until the second call to scanf
returns EOF
(which will normally be at the end of the file, though it can also happen in case of something like a read error).
Edit2: Of course, there is another possibility that's (at least arguably) simpler and easier to understand:
int ch;
while (EOF != (ch=getchar()))
if (ch=='\n')
++lines;
The only part of this that some people find counterintuitive is that ch
must be defined as an int
, not a char
for the code to work correctly.

- 476,176
- 80
- 629
- 1,111
-
I can do a while until fgets returns NULL? like that: while(fgets(szTmp, 256, pfFile)) nLines++; – Sunscreen Nov 25 '10 at 16:03
-
@Sunscreen: no, you can't. fgets() will return fragments if your line is longer than 256 characters and your count will be too high. You have to check for the EOL character. – icanhasserver Nov 25 '10 at 16:18
-
@Sunscreen: see edits. Now that it's clear what you really want, there is an approach I think is a bit cleaner than using `fgets`. – Jerry Coffin Nov 25 '10 at 16:40
-
A big +1! This answer (1) gives detailed explanations of everything the code does and how it deals with input cases, (2) avoids all failure cases by not using any buffers, and (3) demonstrates one of the rare correct uses of the `scanf` family. – R.. GitHub STOP HELPING ICE Nov 25 '10 at 16:42
-
In my opinion, using two scanf() calls simply to count newlines is not the most straightforward method of achieving the desired result. Its certainly more obfuscated and certainly less efficient than a single call to fgetc() or getchar(), as presented by a couple of the other answers here. – Kamal Nov 25 '10 at 16:52
-
-
@vlabrecque: "not the fastest", except when it is. Just for one example, I've seen a real implementation of C that did essentially no buffering, so the overhead of I/O calls was *quite* high -- to the point that each individual call took almost constant time. Improving speed meant reducing the number of individual calls. On that system, the `scanf` version would almost certainly be faster than those using `getc`, `getchar`, etc. – Jerry Coffin Nov 25 '10 at 17:29
-
@Jerry Coffin: I'm not sure I understand your example. You saw a standard C library which does no I/O buffering when you use getc, but does so when using scanf? – vlabrecque Nov 25 '10 at 17:40
-
It only does buffering inside the kernel, and each individual call ends up going to kernel mode. The time taken for a single call is almost constant, whether it reads 1 character or 1000. – Jerry Coffin Nov 25 '10 at 18:24
-
I should add that (at least on *most* systems) none of it makes any difference anyway -- the time to count lines in (say) a 1 megabyte file will be indistinguishable from the time it takes to read that much data from disk. Using getc, getchar, fread, scanf, etc., won't make any measurable difference. – Jerry Coffin Nov 25 '10 at 18:26
-
@Jeff Coffin: So you had an implementation of stdio where (1) getc() did a 1-byte read(3) system-call, but (2) the scanf() implementation did a big read(3) system-call and did internal buffering? Only in scanf? – vlabrecque Nov 25 '10 at 18:27
-
@vlabrecque: Not *only* scanf -- `fread`, and `fwrite` did big calls too (and, technically, it wasn't to `read(3)`, since it wasn't UNIX, but it was to its closest equivalent). – Jerry Coffin Nov 25 '10 at 18:35
-
1I think there's one corner-case you've missed - what about a file where the last line doesn't end in newline? – caf Nov 26 '10 at 01:54
-
If a file consists of only non-`'\n'`, this method reports 0, when 1 is expected. – chux - Reinstate Monica Apr 29 '14 at 22:43
-
@chux: Yes--if you read the OP's comment, he specifies that he wants to count new-lines, so that's what I showed. IOW, you might expect a 1, but the original poster didn't (at least didn't seem to). – Jerry Coffin Apr 30 '14 at 03:37
-
@Jerry Coffin Agree - Saw OP's title "count the lines" and not OP's differing comment of "need the newlines". – chux - Reinstate Monica Apr 30 '14 at 12:09
Here's a solution based on fgetc() which will work for lines of any length and doesn't require you to allocate a buffer.
#include <stdio.h>
int main()
{
FILE *fp = stdin; /* or use fopen to open a file */
int c; /* Nb. int (not char) for the EOF */
unsigned long newline_count = 0;
/* count the newline characters */
while ( (c=fgetc(fp)) != EOF ) {
if ( c == '\n' )
newline_count++;
}
printf("%lu newline characters\n", newline_count);
return 0;
}

- 7,160
- 2
- 21
- 12
-
I've tried a million different ways count new lines in all of the methods suggested above and yours was the only one that worked for me! So thank you – Maheen Siddiqui Aug 19 '13 at 23:52
Common, why You compare all characters? It is very slow. In 10MB file it is ~3s.
Under solution is faster.
unsigned long count_lines_of_file(char *file_patch) {
FILE *fp = fopen(file_patch, "r");
unsigned long line_count = 0;
if(fp == NULL){
return 0;
}
while ( fgetline(fp) )
line_count++;
fclose(fp);
return line_count;
}

- 1,752
- 1
- 21
- 25
-
It depends on the length of the line. For my task it was ~400 times faster. – Krzysztof Szewczyk May 15 '13 at 13:02
-
Why is it faster? The internal implementation of fgetline() also has to compare every character to find the newline... – Max Snijders Jun 01 '13 at 11:28
-
-
readahead and multi-threading would make a difference (and a true aio filesystem) – scheiflo Aug 26 '13 at 06:44
-
2
-
1`if (fp == NULL) fclose(fp)` If the pointer is `NULL` doesn't this imply the file wasn't found or something? Anyway, it means it wasn't _opened_ in the first place. Why do you need to call `fclose`? (I only know for sure that this applies in [C++](http://stackoverflow.com/questions/24487381/closing-c-file-stream-is-not-opened). Is it the same in C? – Arc676 Aug 24 '15 at 09:39
Maybe I'm missing something, but why not simply:
#include <stdio.h>
int main(void) {
int n = 0;
int c;
while ((c = getchar()) != EOF) {
if (c == '\n')
++n;
}
printf("%d\n", n);
}
if you want to count partial lines (i.e. [^\n]EOF):
#include <stdio.h>
int main(void) {
int n = 0;
int pc = EOF;
int c;
while ((c = getchar()) != EOF) {
if (c == '\n')
++n;
pc = c;
}
if (pc != EOF && pc != '\n')
++n;
printf("%d\n", n);
}

- 143,097
- 13
- 135
- 256

- 346
- 1
- 4
-
1+1 IMO, this is the best `getchar()` answer as it deals with the last line not terminated with `'\n'`. Suggested minor simplification: `int pc = '\n'; while (..) { ...} if (pc != '\n') ++n;` – chux - Reinstate Monica Apr 29 '14 at 22:51
What about this?
#include <stdio.h>
#include <string.h>
#define BUFFER_SIZE 4096
int main(int argc, char** argv)
{
int count;
int bytes;
FILE* f;
char buffer[BUFFER_SIZE + 1];
char* ptr;
if (argc != 2 || !(f = fopen(argv[1], "r")))
{
return -1;
}
count = 0;
while(!feof(f))
{
bytes = fread(buffer, sizeof(char), BUFFER_SIZE, f);
if (bytes <= 0)
{
return -1;
}
buffer[bytes] = '\0';
for (ptr = buffer; ptr; ptr = strchr(ptr, '\n'))
{
++count;
++ptr;
}
}
fclose(f);
printf("%d\n", count - 1);
return 0;
}

- 1,054
- 10
- 11
-
no reason to buffer stdio's buffered input. Also, this will report -1 on an empty file. – vlabrecque Nov 25 '10 at 17:06
-
Do not recommend this. It exits with `-1` on any file whose length is a multiple of `BUFFER_SIZE` including an empty file as noted by @vlabrecque. – chux - Reinstate Monica Apr 30 '14 at 11:45