9

I have made a program to compute the permutations of an 8 character string "sharjeel".

#include <iostream>
#include <time.h>

char string[] = "sharjeel";
int len = 8;
int count = 0;

void swap(char& a, char& b){
    char t = a;
    a = b;
    b = t;
}
void permute(int pos) {
    if(pos==len-1){
        std::cout << ++count << "\t" << string << std::endl;
        return;
    }
    else {
        for (int i = pos; i < len;i++)
        {
            swap(string[i], string[pos]);
            permute(pos + 1);
            swap(string[i], string[pos]);
        }
    }
}

int main(){
    clock_t start = clock();
    permute(0);
    std::cout << "Permutations: " << count << std::endl;
    std::cout << "Time taken: " << (double)(clock() - start) / (double)CLOCKS_PER_SEC << std::endl;
    return 1;
}

If I print each permutation along it takes about 9.8 seconds for the execution to complete.

40314   lshaerej
40315   lshareej
40316   lshareje
40317   lshareej
40318   lshareje
40319   lsharjee
40320   lsharjee
Permutations: 40320
Time taken: 9.815

Now if I replace the line:

std::cout << ++count << "\t" << string << std::endl;

with this:

++count;

and then recompile, the output is:

Permutations: 40320
Time taken: 0.001

Running again:

Permutations: 40320
Time taken: 0.002

Compiled using g++ with -O3

Why is std::cout so relatively time consuming? Is there a way to print that is faster?

EDIT: Made a C# version of the program

/*
 * Permutations
 * in c#
 * much faster than the c++ version 
 */

using System;
using System.Diagnostics;

namespace Permutation_C
{
    class MainClass
    {
        private static uint len;
        private static char[] input;
        private static int count = 0;

        public static void Main (string[] args)
        {
            Console.Write ("Enter a string to permute: ");
            input = Console.ReadLine ().ToCharArray();
            len = Convert.ToUInt32(input.Length);
            Stopwatch clock = Stopwatch.StartNew();
            permute (0u);
            Console.WriteLine("Time Taken: {0} seconds", clock.ElapsedMilliseconds/1000.0);
        }

        static void permute(uint pos)
        {

            if (pos == len - 1u) {
                Console.WriteLine ("{0}.\t{1}",++count, new string(input));
                return;
            } else {
                for (uint i = pos; i < len; i++) {
                    swap (Convert.ToInt32(i),Convert.ToInt32(pos));
                    permute (pos + 1);
                    swap (Convert.ToInt32(i),Convert.ToInt32(pos));
                }
            }

        }
        static void swap(int a, int b) {
            char t = input[a];
            input[a] = input[b];
            input[b] = t;
        }
    }
}

Output:

40313.  lshaerje
40314.  lshaerej
40315.  lshareej
40316.  lshareje
40317.  lshareej
40318.  lshareje
40319.  lsharjee
40320.  lsharjee
Time Taken: 4.628 seconds
Press any key to continue . . .

From here, Console.WriteLine() seems almost twice as fast when compared with the results from std::cout. What seems to be slowing std::cout down?

Servy
  • 202,030
  • 26
  • 332
  • 449
Saad
  • 371
  • 1
  • 3
  • 13
  • 2
    You're flushing the buffer at each permutation. Try again with `'\n'` instead of `std::endl`. Still, it will slow for reasons explained in the answer, but you may observe a speed-up. – juanchopanza Mar 27 '17 at 16:52
  • [C++ cout printing slowly](http://stackoverflow.com/q/1736267/995714), [Why is scanf/printf faster than cin/cout?](http://stackoverflow.com/q/18048946/995714), [Why is reading lines from stdin much slower in C++ than Python?](http://stackoverflow.com/q/9371238/995714) – phuclv Mar 27 '17 at 16:53
  • 3
    Consider using [`std::ios_base::sync_with_stdio`](http://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio). do disable synchronization between `printf` and `std::cout`. – François Andrieux Mar 27 '17 at 16:59
  • @nos Without optimization 'Time taken: 0.001' – Saad Mar 27 '17 at 17:00
  • Interesting that the C# version is faster - Console.WriteLine blocks until the output is completely written : http://stackoverflow.com/questions/3670057/does-console-writeline-block Maybe something similar (but slower) happening with C++ – PaulF Mar 27 '17 at 17:00
  • Note that if you never output the result, the compiler is usually smart enough to optimize away the entire calculation. This explains why it's instant with no `cout`. (There are other ways around this, e.g. calling a function from another compile unit with the result, using `volatile`, etc.) – Cameron Mar 27 '17 at 17:02
  • If you don't want to flush the output stream on each write then don't use `std::endl` - use `\n` instead. – Jesper Juhl Mar 27 '17 at 17:25
  • Don't know what is your environment but it takes only 0.060076s on my machine (0.000883 without output). My observed performance factor is 70x longer with output, and 500x/1000x longer in your case. – Jean-Baptiste Yunès Mar 27 '17 at 17:27

1 Answers1

19

std::cout ultimately results in the operating system being invoked.

If you want something to compute fast, you have to make sure that no external entities are involved in the computation, especially entities that have been written with versatility more than performance in mind, like the operating system.

Want it to run faster? You have a few options:

  1. Replace << std::endl; with << '\n'. This will refrain from flushing the internal buffer of the C++ runtime to the operating system on every single line. It should result in a huge performance improvement.

  2. Use std::ios::sync_with_stdio(false); as user Galik Mar suggests in a comment.

  3. Collect as much as possible of your outgoing text in a buffer, and output the entire buffer at once with a single call.

  4. Write your output to a file instead of the console, and then keep that file displayed by a separate application such as Notepad++ which can keep track of changes and keep scrolling to the bottom.

As for why it is so "time consuming", (in other words, slow,) that's because the primary purpose of std::cout (and ultimately the operating system's standard output stream) is versatility, not performance. Think about it: std::cout is a C++ library function which will invoke the operating system; the operating system will determine that the file being written to is not really a file, but the console, so it will send the data to the console subsystem; the console subsystem will receive the data and it will start invoking the graphics subsystem to render the text in the console window; the graphics subsystem will be drawing font glyphs on a raster display, and while rendering the data, there will be scrolling of the console window, which involves copying large amounts of video RAM. That's an awful lot of work, even if the graphics card takes care of some of it in hardware.

As for the C# version, I am not sure exactly what is going on, but what is probably happening is something quite different: In C# you are not invoking Console.Out.Flush(), so your output is cached and you are not suffering the overhead incurred by C++'s std::cout << std::endl which causes each line to be flushed to the operating system. However, when the buffer does become full, C# must flush it to the operating system, and then it is hit not only by the overhead represented by the operating system, but also by the formidable managed-to-native and native-to-managed transition that is inherent in the way it's virtual machine works.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
  • But Console.WriteLine() is 2x faster – Saad Mar 27 '17 at 16:52
  • 2
    Also `std::ios::sync_with_stdio(false);` as the first line makes a difference too. – Galik Mar 27 '17 at 18:12
  • @Galik Thank you so much for this comment!!! I knew my stdio output was most of the problem with logging an async task, but this definitely make it easier to see for me that it was mostly OS io lag. I combined this with removing the `std::endl` from my prints and got amazing results. – SeedyROM Sep 16 '19 at 10:53
  • 1
    There is no way on Earth that outputting to a file and keeping track of it in Notepad++ is going to be faster than the Windows console, however slow it might be. Also, modern Windows versions tipically use a truetype font for the console, not raster fonts (but, in all fairness, you didn't specifically said otherwise - you said the *display* is raster, which yeah, it's true). Alas, the calls to the graphics subsystem must be done regardless of what program is displaying the output, even np++. Finally, I wouldn't state that scrolling the console requires *large* amounts of video RAM, not at all. – Marc.2377 Sep 23 '19 at 03:57
  • @Marc.2377 I admit I have not benchmarked it, so it might not be faster. I am just making suggestions for different approahes that will all be faster than the initial approach, without making any claims as to which one of those better approaches is best. – Mike Nakis Sep 23 '19 at 17:17
  • Then, keep in mind that what counts is **not** the total amount of clock cycles needed to receive and display a chunk of data, but rather, the clock cycles needed to get that chunk of data out of the application's process, so that the application can continue doing its own stuff. That's what (mostly) determines the apparent speed of the application. – Mike Nakis Sep 23 '19 at 17:18
  • So, once the chunk of data is out of the application and into the operating system's buffers, the application is free. Notepad++ will handle that chunk of data in a different process, so the time it will consume to display it is irrelevant. I do not know what percentage of the console's inner workings is done within the context of the calling process and how much is done in a separate process. That's what would ultimately determine which approach would be faster. – Mike Nakis Sep 23 '19 at 17:18
  • A valid point, but writing to disk is also not all that fast, especially when there is another application reading from it at the same time. – Marc.2377 Sep 23 '19 at 22:12
  • @Marc.2377 yes, but the OS will usually buffer the data to be written, and do the actual writing later, not from within the process of interest. You would have to be writing at such a high rate as to fill up the operating system's buffers before you would be hit by disk-write delays. Anyhow, I am not saying it is all just fine, I am just saying it might not be too bad. – Mike Nakis Sep 24 '19 at 08:23