The answer to almost any optimization problem is "first, profile". Profile your C++ application and determine where the time is being spent.
Still, I can make some educated guesses about what is slow here, and point out how this would show up in the profiler.
Slow getline()
getline()
may be implemented in a slow manner. For example, it may need to ask the runtime for one character at a time, since it needs to stop reading once it seems the newline character. That is, it cannot ask for bytes in bigger chunks, since it has no guaranteed way to "put back" the rest of the chunk when a newline appears in the middle of the chunk.
The runtime is almost certainly going to buffer the underlying file read, so this won't be anywhere near as bad as one system call per character, but the overhead of effetively calling getc
for every character in the file can still be significant.
This would show up in the profiler as a lot of time spent in getline()
- and in particular in some getc()
-like method that getline calls.
The python implementation doesn't have this issue at all, since a single readlines()
call is made and the implementation knows the entire file will be read and can buffer at will.
Redundant Copying
The other likely candidate is redundant copying.
First runtime makes read()
calls and copies chunks of the file into an internal buffer. Then the getline()
implementation is likely going to have an internal buffer of char[]
where it builds up the string before passing it to a string
constructor, which likely makes another copy (unless the runtime is using internal tricks to hand off the buffer directly).
Then, as Johnny_S points out, there may be more copies when you push these strings into the vector.
This would show up in the vector as time spent spread around in the various copies as mentioned above, e.g., in the string()
constructor.
The python implementation can also avoid most of these redundant copies since it has a higher level view of the problem, and rather than the layered approach in your C++ implementation, so it likely only makes 1 or 2 copies.
Solutions
The solution mentioned here fixes both of the problems above. To re-implement the Python readlines
call, you should go a bit lower level. Read the file in chunks of char[]
, and look for newline characters directly in the buffer. Technically, you don't need to create string
objects at all, since you only output the number of lines found, but if you do want to create those objects, make sure you only copy the char[]
data once into each string.
You can do this using the string (const char* s, size_t n)
constructor, pointing directly into your character buffer. Finally, ensure you don't make another copy when you copy into your vector, as Johnny_S suggests.