-2

I was trying to write to a file or save the string s.substr (space_pos) in a vector as fast as possible. I tried to write it to a file with ofstream or to output it with cout but it takes a long time. The size of the text file is 130mb.

This is the code:

fstream f(legitfiles.c_str(), fstream::in );
string s;
while(getline(f, s)){
    size_t space_pos = s.rfind(" ") + 1;

    cout << s.substr(space_pos) << endl;
    ofstream results("results.c_str()");
    results << s.substr(space_pos) << endl;
    results.close();

}
cout << s << endl;
f.close();

Is there a way to write or print the string in a faster way?

  • If you have one available, run your program [through a profiler](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) to find out which of the operations is murdering your program's performance. It'll probably turn out to be the unnecessary file IO that comes from the stream flush in the `endl`. Consider using '\n' instead to avoid the flush. – user4581301 Jul 30 '18 at 22:09
  • 1
    Some reading on that, and possibly a duplicate: [C++: “std::endl” vs “\n”](https://stackoverflow.com/questions/213907/c-stdendl-vs-n) – user4581301 Jul 30 '18 at 22:11
  • If you want "as fast as possible" you probably shouldn't be using `std::endl` which does a `std::flush` in addition to writing the newline - you probably want to be writing just a `'\n'` rather than `std::endl` most of the time. – Jesper Juhl Jul 30 '18 at 22:29
  • Have you tried using `write` statements directly to a file descriptor number, via `open` and `close`? – Eljay Jul 30 '18 at 22:30
  • 1
    If your code works and you're wondering how to make it better, Stack Overflow is not the place for you. Ask this question in [Code Review](https://codereview.stackexchange.com/) – Fureeish Jul 30 '18 at 22:37
  • Why are you opening and closing the file every time around the loop. – Martin York Jul 30 '18 at 23:58

3 Answers3

4

Uncouple the C++ stream from the C stream:

std::ios_base::sync_with_stdio(false);

Remove the coupling between cin and cout

std::cin.tie(NULL);

Now don't use std::endl needlessly flushes the fstream buffer after every line, flushing is expensive. You should use a newline escape character \n instead and leave the buffer flushing to the stream.

Also don't build an extra string you don't need. Use a character string_view (which prevents copying)

s.substr(space_pos)

//replace with:
std::string_view  view(s);
view.substr(space_pos);

If you don't have a modern compiler just use C-Strings.

s.data() + space_pos
Martin York
  • 257,169
  • 86
  • 333
  • 562
pablo285
  • 2,460
  • 4
  • 14
  • 38
  • nothing has changed ... the speed in writing remains the same, I had already tried. – Jospeh Body Jul 30 '18 at 22:27
  • @JospehBody Try again with these additions. Removing the flushing speeds things up if you are using files. But if you are using cin/cout there are are few other things that need to be taken into account once they have been done then removing the extra flush will dramatically help. – Martin York Jul 31 '18 at 00:00
  • `std::ios_base::sync_with_stdio(false);` or `std::cin.tie(NULL);` do nothing to increase I/O-speed with `fstream`s ... – Swordfish Aug 09 '18 at 21:26
2

You are duplicating the substring. I suggest creating a temporary:

ofstream results("results.c_str()");
while(getline(f, s)){
    size_t space_pos = s.rfind(" ") + 1;
    const std::string sub_string(s.substr(space_pos));
    cout << sub_string << "\n";
    results << sub_string << "\n";
}
results.close();

You'll need to profile to see if the next code fragment is faster:

while(getline(f, s))
{
    static const char newline[] = "\n";
    size_t space_pos = s.rfind(" ") + 1;
    const std::string sub_string(s.substr(space_pos));
    const size_t length(sub_string.length());
    cout.write(sub_string.c_str(), length);
    cout.write(newline, 1);
    results.write(sub_string.c_str(), length);
    results.write(newline, 1);
}

The idea behind the 2nd fragment is that you are bypassing the formatting process and directly writing the contents of the string to the output stream. You'll need to measure both fragments to see which is faster (start a clock, run an example at least 1E6 iterations, stop the clock. Take average).

If you want to speed up the file writing, remove the writing to std::cout.

Edit 1: multiple threads
You may be able to get some more efficiency out of this by using multiple threads: "Read Thread", "Processing Thread" and "Writing Thread".
The "Read Thread" reads the lines and appends to a buffer. Start this one first. After a delay, the "Processing Thread" performs the substr method on all the strings.
After N about of strings have been processed, the "Writing Thread" starts and writes the substr strings to the file.

This technique uses double buffering. One thread reads and places data into the buffer. When the buffer is full, the Processing Thread should start processing and placing results into a second buffer. When the 2nd buffer is full, the Writing Thread starts and writes the buffer to the results file. There should be at least 2 "read" buffers and 2 "write" buffers. The amount and size of the buffers should be adjusted to get the best performance from your program.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
-1

//Edit: Please note that this answer solves a different problem than that stated in the question. It will copy each line skipping everything from the beginning of the line up to and including the first whitespace.

It might be faster to read the first word of a line and throw it away before getline()ing the rest of it instead of using string::find() and std::substr(). Also you should avoid opening and closing the output file on every iteration.

#include <string>
#include <fstream>

int main()
{
    std::ifstream is{ "input" };
    std::ofstream os{ "output" };
    std::string str;
    str.reserve(1024); // change 1024 to your estimated line length.

    while (is.peek() == ' ' || is >> str, std::getline(is, str)) {
        str += '\n';           // save an additional call to operator<<(char)
        os << str.data() + 1;  // +1 ... skip the space
        // os.write(str.data() + 1, str.length() - 1); // might be even faster
    }
}
Swordfish
  • 12,971
  • 3
  • 21
  • 43
  • Not the same as the original. A line with a leading space is handled differently. The `operator>>` ignore leading white space before reading the word. – Martin York Jul 30 '18 at 23:49
  • Don't think that does what you think it does. Also it will not solve the problem. What happens if the line starts with two spaces? – Martin York Jul 31 '18 at 00:35
  • @MartinYork Yes, it does what I think it does ;) The problem is merely that I missed the `r` in `s.rfind(" ") + 1` in the code of the question. :( – Swordfish Jul 31 '18 at 00:39