-7

Completely new to C++.

I'm comparing various aspects of C++, C# and Ruby to see if there's need for mirroring a library. Currently, simple read of a file (post update).

Compiling C++ and C# in VS 2017. C++ is in release(x64) mode (or at least compile then run)

The libraries more or less read a file and split the lines into three which make up the members of an object which are then stored in an array member.

For stress testing I tried a large file 380MB(7M lines) (after update) now getting similar performance with C++ and Ruby,

Purely reading the file and doing nothing else the performance is as below:

Ruby: 7s
C#:   2.5s
C++:  500+s (stopped running after awhile, something's clearly wrong)
C++(release build x64): 7.5s

The code:

#Ruby
file = File.open "test_file.txt"
while !file.eof 
    line = file.readline
end

//C#
StreamReader file = new StreamReader("test_file.txt");
file.Open();
while((line = file.ReadLine()) != null){

}



//C++
#include "stdafx.h"
#include "string"
#include "iostream"
#include "ctime"
#include "fstream"
int main()
{
    std::ios::sync_with_stdio(false);
    std::ifstream file;
    file.open("c:/sandboxCPP/test_file.txt");
    std::string line;

    std::clock_t start;
    double duration;
    start = std::clock();
    while (std::getline(file, line)) {

    }
    duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
    std::cout << "\nDuration: " << duration;
    while (true) 
    {

    }
    return 0;
}

Edit: The following performed incredibly well. 0.03s

vector<string> lines;
string tempString = str.str();
boost::split(lines, tempString, boost::is_any_of("\n"));
start = clock();
cout << "\nCount: " << lines.size();
int count = lines.size();
string s;
for (int i = 0; i < count; i++) {
    s =  lines[i];
} 

s = on the likelihood that I don't know what boost is doing. Changed performance.

Tested with a cout of a random record at the end of the loop.

Thanks

CBusBus
  • 2,321
  • 1
  • 18
  • 26
  • Are you running it in release mode? Are you using native c++? – drescherjm Jan 05 '18 at 21:16
  • @drescherjm I'm running it straight from VS 2017. Using ctrl+shift+b then ctrl+f5. – CBusBus Jan 05 '18 at 21:16
  • 1
    Could you post your actual complete source code. The c++ code you provided should not compile (it is missing a closing brace). – Jarra McIntyre Jan 05 '18 at 21:17
  • I think you are running a Debug build switch your configuration to Release. – drescherjm Jan 05 '18 at 21:17
  • @JarraMcIntyre Sure, two minutes – CBusBus Jan 05 '18 at 21:18
  • @drescherjm That can’t be the entire issue, or even most of it. It shouldn’t result in that kind of slowdown. – Daniel H Jan 05 '18 at 21:18
  • I have seen Debug mode of Visual Studio take 100x as long as release mode. However I would not expect it to be the case here. In my case the main problem was lots of allocations and heap checks. – drescherjm Jan 05 '18 at 21:19
  • 1
    See this (maybe a duplicate): https://stackoverflow.com/q/6820765/10077 Stick `std::ios::sync_with_stdio(false);` in there and see if it speeds up dramatically. – Fred Larson Jan 05 '18 at 21:19
  • 6
    Regardless of the reason of slowdown, profiling a non-release build is pointless. You're literally asking the compiler to produce easier-to-debug output instead of fast output in debug mode, so why measure it? – GManNickG Jan 05 '18 at 21:21
  • @JarraMcIntyre. Thanks, I've added the code. There was a lot of commented out code in between but other than that it's from the source. – CBusBus Jan 05 '18 at 21:26
  • @GManNickG. I don't need to know exactly how well it performs in comparison, just a rough idea. I'm new to C++ so I thought it'd be worth checking out. – CBusBus Jan 05 '18 at 21:28
  • 1
    Please also try it in Release mode and report the difference in speeds. – Adrian McCarthy Jan 05 '18 at 21:28
  • @FredLarson Thanks for the suggestion. It didn't really make much of a difference. – CBusBus Jan 05 '18 at 21:30
  • See also https://stackoverflow.com/q/39309463/10077 I.e., if you use any `std::endl` that you didn't show, that can be a performance killer. – Fred Larson Jan 05 '18 at 21:30
  • Running the code from Visual Studio can also be a problem. It attaches debug hooks in order to catch exceptions and other errors. Even if you're running a Release build. Results can be different if you just run it from a command window. – Zan Lynx Jan 05 '18 at 21:34
  • Your `c++` code is not handling `i` correctly. I see no increment. Also you should use std::chrono for timing. – drescherjm Jan 05 '18 at 21:36
  • Ok, compiled in release mode and it reduced to roughly the same speed as Ruby. – CBusBus Jan 05 '18 at 21:36
  • A good start. Do you still have `std::ios::sync_with_stdio(false);` in the code? – user4581301 Jan 05 '18 at 21:38
  • @user4581301 Yeah, I've left that in. – CBusBus Jan 05 '18 at 21:38
  • @CBusBus Looking at the complete code, I cannot see any reason why it would take so long. The symptoms you described are consistent with a hung process. I would first create a small test file and check that your code works on that. If it does not try stepping through with a debugger. – Jarra McIntyre Jan 05 '18 at 21:42
  • @JarraMcIntyre Cool. Not sure where to begin with that though, this is the first time I've really touched C++ – CBusBus Jan 05 '18 at 21:44
  • 1
    Your loop never increments `i`; it’ll never print even if it is reading the file. – Daniel H Jan 05 '18 at 21:44
  • @DanielH Sorry, I accidentally removed that in the previous edit. Added it again – CBusBus Jan 05 '18 at 21:45
  • 3
    @CBusBus It would help if we had the *full* file. Save a new file with all the commented out code removed, then try to compile and run that. It should have things like `#include `, `int main`, etc. – Daniel H Jan 05 '18 at 21:48
  • 1
    @CBusBus Debugging C++ is the same as C#. You can set a breakpoint in VS by clicking the LHS margin. Then press the run button and step through. There are multiple reasons a console program might hang. E.g. waiting for input (were does the path variable come from?), blocked on stdout if text is selected in the console, or if the debugger is attached waiting for the user to hit continue/next, or being stuck in an infinite loop (in code not posted?). If you had to force terminate after 500s that is a sign of blocking/being stuck in a infinite loop not of C++ being slow. That is simply too slow. – Jarra McIntyre Jan 05 '18 at 22:02
  • @JarraMcIntyre I used the timer in the function to ensure it wasn't an infinite loop. I've just tested the the loop with nothing in it and a clock within the function, the time for 1.2M lines is 1.6s, around .3s more than Ruby. Never really use break points, usually just have something kill it once it reaches somewhere or print to the console. – CBusBus Jan 05 '18 at 22:07
  • @DanielH. Thanks. I've moved all the code into a self-contained console app. Added the source above. – CBusBus Jan 05 '18 at 22:16
  • And now would be the time to clean up the whole question, showing only relevant stuff instead of a series of edits (especially since the code shown does nothing like what you describe in the first paragraphs). Ideally add a quick piece of how to come up with a test input file. So that we're looking at the same thing here (and not every single one of us has to go hunting for a large text file)... – DevSolar Jan 05 '18 at 22:24
  • And something is *wildly* off here. With a 3 MB file (which gets processed in <1 second real time) I get an output of "15". That's *obviously* wrong. – DevSolar Jan 05 '18 at 22:29
  • 1
    Well, this is wrong: `duration = std::clock() - start / (double)CLOCKS_PER_SEC;` It should be `duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;`. – Fred Larson Jan 05 '18 at 22:33
  • @DevSolar. Sorry, thought didn't occur to me. I've updated the question. – CBusBus Jan 05 '18 at 22:33
  • @FredLarson: Yep, just got there myself (by printing start and end `clock_t` directly). Time to go to bed if I don't immediately spot an operator precedence problem anymore. ;-) – DevSolar Jan 05 '18 at 22:36
  • @FredLarson My mistake. I wrote that again when I moved to the single console app version. Results were consistent so it never stood out. Updated and ran again with the same results. – CBusBus Jan 05 '18 at 22:37
  • @CBusBus: Cannot compare as I don't have your file, but I get 1.1s on 390 MB of LaTeX source (Cygwin64 GCC 6.4.0 at `-O3`)... – DevSolar Jan 05 '18 at 22:41
  • @DevSolar My laptop is a few years old so that might be something to do with it but I kind of expected C++ to at least perform as well as C#. No matter what I change it doesn't seem to improve (beyond the earlier suggestion of compiling as release (never knew that was a thing until today)). – CBusBus Jan 05 '18 at 22:44
  • Removing the infinite loop at the end should help C++ end faster. – Eljay Jan 05 '18 at 22:55

2 Answers2

2

Based on the comments and the originally posted code (it has now been fixed [now deleted]) there was previously a coding error (i++ missing) that stopped the C++ program from outputting anything. This plus the while(true) loop in the complete code sample would present symptoms consistent with those stated in the question (i.e. user waits 500s sees no output and force terminates the program). This is because it would complete reading the file without outputting anything and enter into the deliberately added infinite loop.

The revised complete source code correctly completes (according to the comments) in ~1.6s for a 1.2 million file. My advice for improving performance would be as follows:

  1. Make sure you are compiling in release mode (not debug mode). Given the user has specified they are using Visual Studio 2017, I would recommend viewing the official Microsoft documentation (https://msdn.microsoft.com/en-us/library/wx0123s5.aspx) for a thorough explanation.

  2. To make it easier to diagnose problems do not add an infinite loop at the end of your program. Instead run the executable from powershell / (cmd) and confirm that it terminates correctly.

EDIT: I would also add:

  1. For accurate timings you also need to take into account the OS disk cache. Run each benchmark multiple times to 'warm-up' the disk cache.
Jarra McIntyre
  • 1,265
  • 8
  • 13
  • Thanks for the advice. In the end I used boost::split with a loop and it reduced to 0.03s for 1.24M lines. I've accepted the answer as debug compile mode was the significant issue. – CBusBus Jan 05 '18 at 23:48
  • No problems. Glad it is resolved. Thanks for the accept. – Jarra McIntyre Jan 06 '18 at 00:00
0

C++ doesn’t automatically write everything the instant you tell it to. Instead, it buffers the data so it can write it all at once, which is usually faster. To say “I really want to write this now.”, you need to say something like std::cout << std::flush (if you use std::endl to end your lines it does this automatically).

Usually you don’t need to do this; the buffers are flushed when the program exits, or when you ask for input from the user, or things like that. However, your program doesn’t exit, so it never flushes its buffer. You read the input, and then the program is executing while(true) forever, never giving the output.

The solution to this is simple: remove the while loop at the end of the program. You should not have that; people usually assume a console program exits when it’s finished. I would’ve guessed you had that because Visual Studio automatically closed the console window when the program was finished, but apparently it doesn’t do that with Ctrl+F5, which you use, so I’m not sure.

Daniel H
  • 7,223
  • 2
  • 26
  • 41
  • Won't the '\n' cause a flush anyway? It won't infinitely buffer afaict. – Jarra McIntyre Jan 05 '18 at 22:39
  • @JarraMcIntyre That would flush the OS buffers (I think; I’m more familiar with POSIX low-level details like that), but `iostreams` has its own internal buffers. I just confirmed (again, not on Windows) that `std::cout << "Hello, world!\n";` followed immediately by either an infinite loop or `std::abort();` will not print anything with `sync_with_stdio` off (it will if I comment out that line, though, which surprised me). – Daniel H Jan 05 '18 at 22:48
  • Oops, I was confused about where some of the buffering happened, which was why I was surprised by the effect of `sync_with_stdio`. If you are syncing, then you will definitely be line-buffered (because the underlying C streams are) and the `\n` causes a flush. If you turn off the `sync_with_stdio`, then I don’t think iostreams has a line-buffering concept at all. – Daniel H Jan 05 '18 at 22:57
  • According to the cpp reference cout does provide some guarantees about when it flushes (std::flush, std::endl, before std::system calls) but whether '\n' flushes is implementation defined. I believe your test shows that it does flush on the MS stl implementation if sync_with_stdio is enabled. This makes sense as, IIRC, stdout does flush on '\n'. If you disable that syncing then std::cout can use its own buffering implementation which does not necessarily have the same behaviour. – Jarra McIntyre Jan 05 '18 at 23:05