1

While reading text files line by line, I noticed significant performance drops when using C++ std::getline compared to the C function getline, using GCC 4.8.5.

I limited my tests to a very simple program counting the number of lines.

In C++:

#include <iostream>
#include <string>
using namespace std;

int main() {
    size_t n = 0;
    string line;

    while (getline(cin, line)) ++n;

    cout << n << endl;

    return 0;
}

In C:

#include <stdio.h>
#include <stdlib.h>

int main() {
    size_t n = 0;
    char* line = NULL;
    size_t len = 0;

    while (getline(&line, &len, stdin) != -1) ++n;

    printf("%zu\n", n);

    free(line);
    return 0;
}

In both cases I use the C++ compiler to make sure that the difference comes exclusively from getline and not from C vs. C++ compiler optimizations.

I start to notice performance drops after "only" a few thousand lines.

Examples with ~600, 2.5k, 20k, 157k, 10M, 40M, 50M, 60M and 80M lines:

Nb lines  |  Time (C)  | Time (C++) | slower
----------+------------+------------+--------
      613 |  0m0.003s  |  0m0.005s  |  1.5x
     2452 |  0m0.002s  |  0m0.011s  |  5.5x
    19616 |  0m0.004s  |  0m0.062s  |   15x
   156928 |  0m0.014s  |  0m0.511s  |   37x
 10043392 |  0m0.776s  |  0m31.560s |   41x
 40173568 |  0m3.335s  |  2m7.752s  |   38x
 50216960 |  0m5.543s  |  2m42.116s |   18x
 60260352 |  0m22.571s |  3m13.148s |    9x
 80347136 |  0m27.713s |  4m18.272s |    9x

These numbers should be taken with a pinch of salt but I think they reflect how values increase with the file size, and that there is some sort of maximum around 40-ish slower before things start to even out slightly, probably due to other limitations (hardware maybe?) instead of software issues.

Considering I read almost exclusively files with 100k+ lines I should expect a performance drop of 9 times (at best) if I stick to the C++ code.

Is there a reason for such a big difference? I understand that the STL overhead can be significant but I thought that file access would outweigh such differences.

Also, is there a way to optimize calls to std::getline (apart from using its C counterpart, obviously)?

Additional notes

  • I tried with GCC 9.3.0 and got similar results
  • I tried some compiler optimizations but observed no significant improvement, the build/test call is: gcc -c count.cpp && gcc -o count count.o && (time ./count < infile)
  • Following @Eljay’s advice, I added ios_base::sync_with_stdio(false); and cin.tie(NULL); at the beginning of the C++ code and results are much better, although not quite on par with the C code ; it could still be acceptable for balancing performance vs. readability (note: this isolated code is readable in both C and C++ but the full code is much more readable in C++)
vdavid
  • 2,434
  • 1
  • 14
  • 15
  • 10
    These are completely different functions, they just happen to have the same name. Because unmentionable substances where smoked when they named these. You are comparing apples and oranges. – Lundin Jun 03 '21 at 11:59
  • Also keep in mind that by default C++ syncs its standard input and output buffers with C's buffers – topoly Jun 03 '21 at 12:08
  • 4
    For C++, `ios_base::sync_with_stdio(false);` and `cin.tie(NULL);` as the first two lines of `main` to decouple `cin` from `stdin`. – Eljay Jun 03 '21 at 12:13
  • 3
    `getline()` is also not standard C. It is posix. – Peter Jun 03 '21 at 12:30
  • 2
    @OP Please post the optimizations you used when you built the application. If you are timing "debug" or unoptimized builds, the timings you're showing are meaningless. – PaulMcKenzie Jun 03 '21 at 12:42
  • 2
    @Eljay thank you. these lines alone have sped up things significantly. I am now "only" 2-3x slower than C. – vdavid Jun 03 '21 at 12:44
  • 1
    @PaulMcKenzie no optimizations otherwise I would have mentioned them. As a matter of fact I tried optimizations like `-O2` or `-O3` and whatnot but it had very little impact and sometimes made things slower. Here is my compile/test command: `g++ -c count.cpp && g++ -o count count.o && (time ./count < infile)` – vdavid Jun 03 '21 at 12:47
  • 1
    @4386427 I run them from the terminal. They are stopped automatically when all lines are read. Since there is no output besides the final count, I assume the terminal performance has zero impact on the result. – vdavid Jun 03 '21 at 12:48
  • 1
    Have you also tried with a more recent version of gcc? – Bob__ Jun 03 '21 at 12:52
  • 1
    @Bob__ I tried with gcc 9.3 and I get similar results – vdavid Jun 03 '21 at 13:49
  • @vdavid good thing the C version works in C++ too. – Aykhan Hagverdili Jun 03 '21 at 14:56
  • @vdavid optimization doesn't make a big difference here because you're IO-bound. But for computational-bound applications then the optimized version may run tens or hundreds of times faster. Never run anything without optimization unless you're debugging [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394/995714), [Why is this code running over 100 times slower in Debug mode than Release?](https://stackoverflow.com/q/36514709/995714) – phuclv Jun 03 '21 at 15:10
  • @phuclv You are ultimately right, but since I did not observe significant changes with optimizations enabled I didn’t want to provide details that could sidetrack us. – vdavid Jun 03 '21 at 15:28
  • 1
    @vdavid My advice -- avoid iostreams library if you can. It is over-designed, over-complicated, every implementation is poorly optimized; it requires a lot of esoteric knowledge to do trivial things right; it is a monster from the past that we can't leave behind for some reason... – C.M. Jun 03 '21 at 15:49
  • 1
    @Peter actually it's in Dynamic Memory TR. – JDługosz Jun 03 '21 at 15:58
  • @C.M. That’s a wise advice considering the time I’ve wasted with these classes so far. What other "good" alternatives do we have, be it in the C++ standard or somewhere else? – vdavid Jun 03 '21 at 15:58
  • @vdavid use `` for basic i/o, write your own stuff for other stuff. What specifically do you need? – C.M. Jun 03 '21 at 16:00
  • 1
    The fastest way, I believe, on modern machines is to create a memory mapping for the file, and implement an object to return a `std::string_view` of the text up to (not including) the delimiter) on each call. – JDługosz Jun 03 '21 at 16:01
  • @C.M. In my case I need to parse log files line by line for large files (usually in the magnitude of tens of thousands of lines). I assumed `getline` would be a good place to start until I realized how inefficient they are in C++. – vdavid Jun 03 '21 at 16:02
  • 1
    @vdavid In this case `` should be enough for your needs, `FILE` performs decent caching, should be enough for you – C.M. Jun 03 '21 at 16:11
  • @JDługosz The fastest way would be to run a (large enough) buffer/window populated asynchronously with API that bypasses OS caching and drops data directly into your buffer (i.e. zero-copy). Implementing this would require using system-specific APIs (that likely would require properly aligned reads, etc) – C.M. Jun 03 '21 at 16:28
  • 1
    Inefficient is a relative term and a 10,000 line file should show negligible difference regardless whether you use C++ `getline()` or C `fgets()` (there will be timing differences, but they should be less than the time it takes you to blink). Now if you have a 100,000,000 line file -- then the difference become much more apparent. For a 10,000 line file, the biggest efficiencies will be in how you are parsing each line rather than the read itself. Yes, iostream is slow, but unless you are actually faced wit the 80M line file, then it is really a premature micro-optimization concern. – David C. Rankin Jun 03 '21 at 17:01
  • @DavidC.Rankin In theory, yes, what we do with the lines is supposed to be more impactful than optimizing how the file is read. That’s precisely why I didn’t bother in the first place. In practice, for my full algorithm the C++ version runs in 7 seconds versus 2.5 seconds in C. Sure the impact is lower _overall_, but IMO it is significant enough to make it worth looking into. – vdavid Jun 03 '21 at 17:16
  • Most definitely. Great inquiry. You now know where the bodies are buried if your log file size increases and more efficiencies are needed. – David C. Rankin Jun 03 '21 at 17:20
  • @C.M. on current mainstream operating systems, opening/reading a file is built on top of memory mapping. The low-level buffer is a mapped view, and a read within that window triggers a page fault. Any kind of "read" is already on top of this. Disabling the low-level handle buffer and doing async aligned read to your own memory is essentially the same thing. I suppose the advantage is that you _own_ that memory and control when it gets overwritten, so you can use `string_view` pointing into it. Is that what you mean by "zero copy"? – JDługosz Jun 04 '21 at 14:21
  • @JDługosz "opening/reading a file is built on top of memory mapping" -- no, not always; certainly not for NFS/etc. Advantage would be overlapping reads with processing, in your model next chunk is not retrieved (from disk/network) until processing generates page fault (and processing has to pause until retrieval completes); in mine -- it reads "ahead" of processing, plus you can take advantage of specific knowledge (like reading in 64kb chunks from NFS server instead of 4kb). Of course, using string_view or similar is a given. "zero copy" = using API that doesn't cause extra data copies – C.M. Jun 04 '21 at 16:08

0 Answers0