On Windows, the default behavior is to not buffer stdout if the output is an interactive console. While other platforms do something similar, this coupled with the default Windows console being surprisingly slow, means if you output a lot of text to it, your program will be slow.
Python, on the other hand, defaults to a line-based buffering approach. So that it will only attempt to draw to stdout once when run.
For the Python side, if you run your script with a command like this:
python -u script_name.py
It should take just as long as the C/C++ versions do, roughly because it's doing the same amount of work, as far as the console is concerned.
On the other hand, if you run a C++ program like this:
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 10000; i++) {
std::cout << i;
}
std::cout << std::endl;
auto t2 = std::chrono::high_resolution_clock::now();
setvbuf(stdout, NULL, _IOLBF, 8192);
for (int i = 0; i < 10000; i++) {
std::cout << i;
}
std::cout << std::endl;
auto t3 = std::chrono::high_resolution_clock::now();
std::cout << "Default behavior: " << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << std::endl;
std::cout << "Buffered behavior: " << std::chrono::duration_cast<std::chrono::milliseconds>(t3 - t2).count() << std::endl;
This will show the performance hit from using the default buffering mode versus a buffering mode that more closely resembles the Python buffering technique.
On my machine it outputs this at the end, showing more than a 20 time speed up from buffering:
Default behavior: 751
Buffered behavior: 34