How to input millions of integers quite fast in C++?

Question

I'm doing a data structure programming assignment about stack in C++.

In this assignment, I should read lots of integers(in the worst case I should read 1,600,000 integers) and finally output some strings.

As a student, I submit my cpp source file and the website judges and scores my source code. I got 100% but I want to do better. The time restriction of this assignment is 2 seconds and the execution time of my source code is 128 milliseconds. However, the top student only used 52 milliseconds to complete the task. So I want to know how to make my code faster.

My source code mainly contains three parts:

use cin to read lots of integers from the OnlineJudge system(up to 1,600,000 integers).
try to find the solution and store it in a char array.
use cout to output the char array.

OnlineJudge tells me the execution time of my code. The 1st part takes 100 milliseconds, the 2nd part takes 20 milliseconds, and the 3rd part takes 12 milliseconds. So if I want to make my code faster, I should improve input speed.

Input of OnlineJudge is like this:

5 2
1 2 3 5 4

The 1st line is two integers n and m, the 2nd line is n integers separated by spaces. Restrictions are: 1<=n<=1,600,000 and 0<=m<=1,600,000. In order to read more than 1 million integers, my code is like this:

#include <iostream>
using namespace std;
int main()
{
    std::ios::sync_with_stdio(false);
    cin.tie(NULL);
    int *exit = new int[1600000];
    cin>>n>>m;
    for (int i=0;i<n;++i)
        cin>>exit[i];
    return 0;
}

If n is small, OnlineJudge says execution time is 0 milliseconds. if n is very large,e.g. 1,600,000. OnlineJudge says this code takes 100 milliseconds. If I delete

std::ios::sync_with_stdio(false);
cin.tie(NULL);

Then the code takes 424 milliseconds. However, reading integers is necessary in this assignment, so I'm really curious about how the top student can finish "cin,find the solution,cout" within only 52 milliseconds.

Do you have any ideas on improving input speed?

2019.4.17：Someone suggests using vector or std::from_chars, but in this assignment these are banned. If I write

#include <vector>

or

#include <charconv>

or

#include <array>

then OnlineJudge says "Compilation error".

Someone suggests using scanf, my code is like this:

for (int i=0;i<n;++i)
    scanf("%d", &exit[i]);

But the execution time is 120 milliseconds.By the way, I don't think scanf is faster than cin, Using scanf() in C++ programs is faster than using cin?

Someone suggests using getline.I seldom uses this fuction,my code is like this:

stringstream ss;
string temp;
getline(cin, temp);
ss<<temp;ss>>n;ss>>m;
ss.clear();temp.clear();
getline(cin, temp);ss<<temp;
for (int i=0;i<n;++i)
    ss>>exit[i];

Execution time is also 120 milliseconds.

Someone suggests using mmap. I've never heard this function before. It seems this function is only available in Unix? But I'm using Visual Studio 2010. My code is like this:

#include <unistd.h>
#include <sys/mman.h>
    //to load 1,600,000 integers
    int *exit = static_cast<int*>(mmap(NULL,1600*getpagesize(),PROT_READ,MAP_ANON|MAP_SHARED,0,0));
    for (int i=0;i<n;++i)
        cin>>*(exit+i);

OnlineJudge says "Runtime error (signal 11)" instead of "Compilation error", signal 11 means "Invalid memory reference", this signalis is sent to a process when it makes an invalid virtual memory reference, or segmentation fault, i.e. when it performs a segmentation violation. I don't know if there's anything wrong with my mmap.Hope you can tell me.

2019.4.22：Thanks for all your help.Now I solve this problem successfully.The key function is mmap.The code is like this:

#include <sys/mman.h>
    cin.tie(NULL);
    std::ios::sync_with_stdio(false);
    string temp;

    int n,m;
    int *exit = new int[1600000];

    const int input_size = 13000000;
    void *mmap_void = mmap(0,input_size,PROT_READ,MAP_PRIVATE,0,0);
    char *mmap_input = (char *)mmap_void;
    int r=0,s=0;
    while (mmap_input[s]<'0' || mmap_input[s]>'9') ++s;
    while (mmap_input[s]>='0' && mmap_input[s]<='9')
    { r=r*10+(mmap_input[s]-'0');++s; }
    n=r;r=0;
    while (mmap_input[s]<'0' || mmap_input[s]>'9') ++s;
    while (mmap_input[s]>='0' && mmap_input[s]<='9')
    { r=r*10+(mmap_input[s]-'0');++s; }
    m=r;r=0;
    while (mmap_input[s]<'0' || mmap_input[s]>'9') ++s;
    for (int i=0;i<n;++i)
    {
        while (mmap_input[s]>='0' && mmap_input[s]<='9')
        { r=r*10+(mmap_input[s]-'0');++s; }
        ++s;
        exit[i]=r;r=0;
    }

Execution time of mmap and convert chars to integers take 8 milliseconds. Now the total execution time of this homework take 40 milliseconds, faster than 52 milliseconds.

Welcome to C++. One of the first things you should read about is how to effectively use Standard Library containers like [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) instead of doing raw allocations of C-style arrays with `new[]`. In C++ you're 100% responsible for memory management, so doing manual allocations means being careful to add in the corresponding `delete[]` calls at the right place and time. Standard Library containers can handle this for you. — tadman, Apr 16 '19 at 17:28
iostreams from standard library are pretty well known to be quite slow for formatted input but they are universal - for example they support different language conventions and so on. If you need speed I would suggest at least read it line by line and convert strings to integers manually. Also if you need to calculate something based on thouse numbers it could be faster to do it in place and to avoid storing them all in memory. — Slava, Apr 16 '19 at 17:35
It's a quite well-known fact that formatted input/output via `std::` streams is, unfortunately, not the fastest. The other student is probably using the likes of `scanf()` from the C standard library, which are typically faster. Even the C standard library functions, however, are forced to respect the formatting rules set up by the current locale. You most likely just need to read integers without worrying about any special formatting. The probably fastest thing next to parsing the integers yourself would most likely be [std::from_chars](https://en.cppreference.com/w/cpp/utility/from_chars)… — Michael Kenzel, Apr 16 '19 at 17:44
Hi JiaCheng Liu, welcome to Stack Overflow. Here are some similar questions you might find helpful: https://stackoverflow.com/questions/16826422/c-most-efficient-way-to-convert-string-to-int-faster-than-atoi https://stackoverflow.com/questions/4351371/c-performance-challenge-integer-to-stdstring-conversion — Kevin Chen, Apr 16 '19 at 17:56
Thanks for your advice. But sorry I forgot to explain this yesterday. In this assignment if one uses "#include " then OnlineJudge says compilation error. @tadman — JiaCheng Liu, Apr 17 '19 at 01:43
@MichaelKenzel I tried scanf, it takes 120 milliseconds. I edit the question to include scanf code. #include is banned in this assignment. — JiaCheng Liu, Apr 17 '19 at 02:26
Well, I guess then you'll, unfortunately, have to write your own parser. Luckily, a basic parser for plain decimal integers is quite simple to write. Thanks for pointing me to this other question concerning `std::cin` performance! That's quite interesting, I was not aware of the fact that the synchronization with C stdio is a major bottleneck there… — Michael Kenzel, Apr 17 '19 at 02:32
Two questions. It looks like https://stackoverflow.com/a/42446361/1466970 says doe the untie first `std::cin.tie (nullptr);` and then turn off sync, `std::cout.sync_with_stdio(false);`. Not sure if that makes a difference or not. Second is does using pointers as in `int i, *p = exit; for (i=0;i> *p;` make a difference? — Richard Chambers, Apr 17 '19 at 02:44
What is the actual problem you need to solve with these integers? I’m wondering if you can avoid fully parsing them. — Daniel H, Apr 17 '19 at 03:04
@Slava I tried getline.I edit the question to include my getline code. Execution time is 120 milliseconds. — JiaCheng Liu, Apr 17 '19 at 03:05
I would hand-code a state machine that's sole purpose is to read integers. I would read large blocks of input data and pass it to my state machine. — David Schwartz, Apr 17 '19 at 03:17
@RichardChambers Sorry I tried your suggestions, but both suggestions make no difference, thanks. — JiaCheng Liu, Apr 17 '19 at 03:46
@n.m. Well, the programming assignment in that website is something like an opencourseware. The scoresheet displays top 10 students sorted by time cost. Public information includes Nickname、Score、Worst case、Time(ms)、Memory(KB). I only know top 1 student's nickname——Snickeen, but that's of no use. Since that website requires user_id and password, so I didn't post that website before. https://dsa.cs.tsinghua.edu.cn/oj/course.shtml?courseid=58 — JiaCheng Liu, Apr 17 '19 at 03:54
@DanielH Simply speaking, denote A as {1,2,……,n}, B as a permutation of {1,2,……,n}. The task is to see whether B is feasible or not by using a stack started from A. For example , if A is {1,2,3,4,5}, B is {1,2,3,5,4}. Given an empty stack, then we can "push 1, pop 1, push 2, pop 2, push 3, pop 3, push 4, push 5, pop 5, pop 4". So {12,3,5,4} is feasible from {1,2,3,4,5}. The original question in that website is longer, but that website requires user_id and password. — JiaCheng Liu, Apr 17 '19 at 04:17
@WilliamPursell I tried mmap. I edit the question to include mmap code. I don't know why OnlineJudge says wrong. Hope you can find the bugs.Thanks. — JiaCheng Liu, Apr 17 '19 at 08:51
I think you are wasting your time chasing a black cat in a dark room. We don't know what the other student did. Maybe he used inline assembly. Maybe he was able to exploit a bug in the system. There is no way of knowing that. Your program is doing the job and that's the only thing that counts. Learn to be a better programmer, not to win a few milliseonds by using an obscure API. But whatever floats your boat. — n. m. could be an AI, Apr 17 '19 at 09:11
You generally cannot mmap standard input, so you are either not telling the whole story or doing an entirely wrong thing. — n. m. could be an AI, Apr 17 '19 at 09:25
"OnlineJudge says "Compilation error" Does the judge say anything beyond these exact words? If not then it is unusable and you may want to complain to your university or just accept it and move on, knowing that you are being judged by a dumb machine programmed by unhelpful people. — n. m. could be an AI, Apr 17 '19 at 09:35
@n.m. Thanks for your suggestions. I'm also doubting whether it is valuable or not to spend time on this question. Perhaps I should email the course teacher directly to see if he has any ideas. — JiaCheng Liu, Apr 17 '19 at 14:19
With your code with `std::getline()` you did not change anything, it is not that `std::cin` is slow but formatted input from `std::istream` and it does not matter if you use `std::cin` or instance of `std::stringstream` — Slava, Apr 17 '19 at 16:10
@n.m. mmap works perfectly find on stdin if stdin is a regular file. — William Pursell, Apr 17 '19 at 17:52
@WilliamPursell That's a great observation, I'm sure it will be worth a few extra points with the online judge. — n. m. could be an AI, Apr 17 '19 at 18:09
Your attempt at `mmap` is (1) totally wrong and (2) useless. Contrary to the commonly held superstition, It will **not** be appreciably faster than normal reading. — n. m. could be an AI, Apr 17 '19 at 18:34
@n.m.: Being able to optimize software is a valuable skill, both for its direct benefits and for the knowledge that is gained along the way. When somebody is interested in learning how to make software faster, please do not discourage them. — Eric Postpischil, Apr 18 '19 at 11:56
@EricPostpischil I hire programmers. I value the ability to optimise *algorithms* over the ability to find a faster canned API to perform a canned task. So I encourage skills that are valuable to me. — n. m. could be an AI, Apr 18 '19 at 12:04
@n.m.: Your needs in the job you work in are not the only valuable ones in the world. There is plenty of code where the algorithms are the best available complexity and what is needed is further optimization that takes advantage of all aspects of performance. Please do not discourage interest in learning. — Eric Postpischil, Apr 18 '19 at 12:57
@EricPostpischil I will encourage and promote what I need, and let others encourage and promote what they need. May the best one win. — n. m. could be an AI, Apr 18 '19 at 13:07
@n.m.: I did not ask you not to encourage or promote. I asked you not to discourage. It is unbecoming and harmful. — Eric Postpischil, Apr 18 '19 at 13:08
@EricPostpischil Thanks a lot to your encouragement. Once I wanted to give up but you said "Being able to optimize software is a valuable skill", so I keep trying other methods. I struggled yesterday and now I use mmap to solve the assignment. Now the total execution time of my code take 40 milliseconds, faster than 52 milliseconds. — JiaCheng Liu, Apr 22 '19 at 04:00

Maxim Egorushkin · Accepted Answer · 2019-04-18T18:30:19.020

A few ideas:

Read integers using std::scanf, not std::istream. The latter is known to be slower for multiple reasons, even with std::ios::sync_with_stdio(false) call.
Read the file by mapping it into memory.
Parse integers faster than scanf and strtol.

Example:

#include <cstdio>

int main() {
    int n, m, a[1600000];
    if(2 != std::scanf("%d %d", &n, &m))
        throw;
    for(int i = 0; i < n; ++i)
        if(1 != std::scanf("%d", a + i))
            throw;
}

You can also unroll that scanf loop to read multiple integers in one call. E.g.:

#include <cstdio>

constexpr int step = 64;
char const fmt[step * 3] =
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d "
    "%d %d %d %d %d %d %d %d %d %d %d %d %d %d %d %d"
    ;
void main() {
    int a[1600000];
    int n, m;
    if(2 != std::scanf("%d %d", &n, &m))
        throw;

    for(int i = 0; i < n; i += step) {
        int expected = step < n - i ? step : n - i;
        int* b = a + i;
        int read = scanf(fmt + 3 * (step - expected),
                         b + 0x00, b + 0x01, b + 0x02, b + 0x03, b + 0x04, b + 0x05, b + 0x06, b + 0x07,
                         b + 0x08, b + 0x09, b + 0x0a, b + 0x0b, b + 0x0c, b + 0x0d, b + 0x0e, b + 0x0f,
                         b + 0x10, b + 0x11, b + 0x12, b + 0x13, b + 0x14, b + 0x15, b + 0x16, b + 0x17,
                         b + 0x18, b + 0x19, b + 0x1a, b + 0x1b, b + 0x1c, b + 0x1d, b + 0x1e, b + 0x1f,
                         b + 0x20, b + 0x21, b + 0x22, b + 0x23, b + 0x24, b + 0x25, b + 0x26, b + 0x27,
                         b + 0x28, b + 0x29, b + 0x2a, b + 0x2b, b + 0x2c, b + 0x2d, b + 0x2e, b + 0x2f,
                         b + 0x30, b + 0x31, b + 0x32, b + 0x33, b + 0x34, b + 0x35, b + 0x36, b + 0x37,
                         b + 0x38, b + 0x39, b + 0x3a, b + 0x3b, b + 0x3c, b + 0x3d, b + 0x3e, b + 0x3f);
        if(read != expected)
            throw;
    }
}

Another option is to parse integers manually (mapping file into memory would help here and there are much faster algorithms for parsing integers than this and standard atoi/strtol, see Fastware - Andrei Alexandrescu):

int main() {
    int n, m, a[1600000];
    if(2 != std::scanf("%d %d", &n, &m))
        throw;

    for(int i = 0; i < n; ++i) {
        int r = std::getchar();
        while(std::isspace(r))
            r = std::getchar();
        bool neg = false;
        if('-' == r) {
            neg = true;
            r = std::getchar();
        }
        r -= '0';
        for(;;) {
            int s = std::getchar();
            if(!std::isdigit(s))
                break;
            r = r * 10 + (s - '0');
        }
        a[i] = neg ? -r : r;
    }
}

Yet another is to map the file into memory and parse it faster:

#include <boost/iostreams/device/mapped_file.hpp>

inline int find_and_parse_int(char const*& begin, char const* end) {
    while(begin != end && std::isspace(*begin))
        ++begin;
    if(begin == end)
        throw;
    bool neg = *begin == '-';
    begin += neg;
    int r = 0;
    do {
        unsigned c = *begin - '0';
        if(c >= 10)
            break;
        r = r * 10 + static_cast<int>(c);
    } while(++begin != end);
    return neg ? -r : r;
}

void main() {
    boost::iostreams::mapped_file f("random-1600000.txt", boost::iostreams::mapped_file::readonly);
    char const* begin = f.const_data();
    char const* end = begin + f.size();
    int n = find_and_parse_int(begin, end);
    int m = find_and_parse_int(begin, end);

    int a[1600000];
    for(int i = 0; i < n; ++i)
        a[i] = find_and_parse_int(begin, end);
}

Benchmark source code.

Note that the results may differ considerably across different versions of compilers and standard libraries:

CentOS release 6.10, g++-6.3.0, Intel Core i7-4790 CPU @ 3.60GHz

---- Best times ----
seconds,    percent, method
0.167985515,  100.0, getchar
0.147258495,   87.7, scanf
0.137161991,   81.7, iostream
0.118859546,   70.8, scanf-multi
0.034033769,   20.3, mmap-parse-faster

Ubuntu 18.04.2 LTS, g++-8.2.0, Intel Core i7-7700K CPU @ 4.20GHz

---- Best times ----
seconds,    percent, method
0.133155952,  100.0, iostream
0.102128208,   76.7, scanf
0.082469185,   61.9, scanf-multi
0.048661004,   36.5, getchar
0.025320109,   19.0, mmap-parse-faster

@MaximEgorushkin Thanks very much for your long long concrete answer. For your first idea, int array[1600000] doesn't cause any problem, but has no effect on execution time. For the second idea, I don't know whether you have read this question or not：https://stackoverflow.com/questions/1042110/using-scanf-in-c-programs-is-faster-than-using-cin What's more, "#include " or are banned. OnlineJudge says "Compilation error",because "No such file or directory". Because your answer is too long, I'll try the rest ideas tomorrow. Thanks. — JiaCheng Liu, Apr 17 '19 at 15:56
@JiaChengLiu In my `istream` benchmark I do call `std::ios::sync_with_stdio(false)`. Without it `istream` takes 355msec. — Maxim Egorushkin, Apr 17 '19 at 16:58
Btw OP also confirms that using automatic array vs dynamically allocated does not affect execution time at all. — Slava, Apr 17 '19 at 17:40
On my machine (this is a server though) it takes 0.00290189s with `new` vs 0.00271056s with stack array to allocate and assign a value into each element for 1.6 million elements. I then calculate sum out of measurement to prevent optimizer to eliminate access. And the time varies btw 2.7ms to 3ms for both variants. I do not really see any difference. — Slava, Apr 17 '19 at 18:27
@MaximEgorushkin Sorry, I'm busy doing homework today. I'll try your ideas some day later, thanks. — JiaCheng Liu, Apr 18 '19 at 15:39
@MaximEgorushkin I tried your ideas yesterday and mmap is the fastest. "cin with sync(false)" take 100 milliseconds, " scanf-multi" take 88 milliseconds, "getline and convert chars to int manually" take 20 milliseconds, "mmap and convert chars to int manually" take 8 milliseconds. Now the total execution time of my code is 40 milliseconds, even faster than the top student's 52 milliseconds. Thanks very much. Coding is interesting. — JiaCheng Liu, Apr 22 '19 at 04:05
@JiaChengLiu You may like to add your benchmark results along with system description to README.md on github. — Maxim Egorushkin, Apr 23 '19 at 09:29
@JiaChengLiu In github version `mmap` is used with `MAP_POPULATE` to reduce the chance of page faults and then forward-only sequential access makes it the best scenario for the hardware prefetcher. — Maxim Egorushkin, Apr 23 '19 at 09:43

Fire Lancer · Answer 2 · 2019-04-17T09:30:15.367

time of my source code is 128 milliseconds. However, the top student only used 52 milliseconds

To run an entire program, this is getting into the area of margin of error. Setting up processes on modern OS takes some time, as will whatever is feeding the input data, and if the server is a shared resource, any resource contention issues. How much does submitting the same exact code vary?

int *exit = new int[1600000];

Memory allocations have a cost. In high performance loops and the like they are often avoided entirely, allthough a single allocation is unlikely to make a major overall difference.

Input of OnlineJudge is like this:
5 2
1 2 3 5 4
The 1st line is two integers n and m, the 2nd line is n integers separated by spaces. Restrictions are: 1<=n<=1,600,000 and 0<=m<=1,600,000. In order to read more than 1 million integers, my code is like this:

I found that std::cin, etc. can be slow, and in some cases so can the number parsing functions. If you can read say the entire line in one go, and then parse that, it may be faster. For parsing the gains generally come from parsing in unsafe ways if you can garuntee the inputs, e.g.

Is ' ' always the delimiter? Looks like it is, and you can special case the end. E.g. read the entire "line" into a buffer then replace the '\n' with ' '.
Is the number of digits known? Is it always 1, or some other small number like less than 5?
Are the numbers always in the valid range?
Is the input always a valid number, no random chars to check for?
Are there ever negative numbers?

Knowing these things you might make say:

/*1 or 2 digit int, space delimiter. Advance p number of consumed chars.*/
int parse_small_int(char **p)
{
    int v = (*p)p[0] - '0';
    char c2 = (*p)[1];
    if (c2 == ' ') // 1 digit
    {
        return v;
    }
    else // assume 2 digit
    {
        v *= 10;
        v += (c2 - '0')
        (*p) += 2;
    }
}

Do you have any ideas on improving input speed?

Same goes for output, you don't seem to show the code, but std::cout can be similarly slow. And if you know some things about the numbers and the allowed output format, you can easily beat <<, std::to_string, itoa, etc.

Are leading zeros valid? If they are, you could write a condition-less formatter for the max allowed value.
Do such formatting to a pre-allocated buffer, then print the entire line.

e.g.

// always write 2 chars to p
void format_int_2_digit(int i, char *p)
{
    p[0] = '0' + (i / 10);
    p[1] = '0' + (i % 10);
}

Another possibility is to bypass the C++ and even C library, although that may not be allowed in your assignment.

For example on Linux you could use the read and write functions with STDIN_FILENO and STDOUT_FILENO. I never actually compared these personally to the CRT versions, but maybe there is a noticeable difference. On Windows there is ReadConsole, WriteConsole, etc., or use GetStdHandle and then ReadFile,WriteFile,etc. Again I never measured these.

You may like to be able to read negative integers as well, including the weird number. — Maxim Egorushkin, Apr 17 '19 at 10:15
That is why I said depends on requirements. If you end up needing at least all of say `strtol`, then you won't gain much or anything trying to beat it, just use `scanf`, `strtol`, etc. But as you start crossing things out of the feature list, sometimes you can beat the runtime functions by a noticeable margin. — Fire Lancer, Apr 17 '19 at 10:26
`strtol` can be beaten by 2x, see [Fastware - Andrei Alexandrescu](https://www.youtube.com/watch?v=o4-CwDo2zpg). — Maxim Egorushkin, Apr 17 '19 at 11:12
@MaximEgorushkin Is there code there? I glanced through it, I didn't see a *feature complete* replacement, so I think it falls into what I said, go for something specific, and you can beat even the best optimised CRT. Just reading only 1 or 2 digits will be a lot more than 2 times faster, and exactly one digit is even faster, and all such things should be a lot faster than `scanf` and certainly `istream >>`. Depends on requirements where the best solution is. — Fire Lancer, Apr 17 '19 at 11:17
There is code and it is a drop in replacement for standard ones, i.e. feature complete. I'll leave it for you to explore. — Maxim Egorushkin, Apr 17 '19 at 11:25
@MaximEgorushkin Which one exactly, I looked through again and they are all good optimisation tips,sure. I was thinking like how `strtol` does the automatic base `0`, `0x`, `0X` detection, or takes a `base` parameter. Or how `strtoi` just doesn't exist so you potentially need an extra range check if you wanted `int` specifically (in the case of wanting complete safety, e.g. on 64bit linux where `long` has a larger range than `int`). — Fire Lancer, Apr 17 '19 at 11:34
@MaximEgorushkin Etiher way, my point was that you can beat `scanf`, `cin`, etc. Is the issue/downvote just that I suggested the possibility of an **unsafe** parse, which while having the potential to be much faster than the best optimised solution, is of course as stated, unsafe? — Fire Lancer, Apr 17 '19 at 11:37

How to input millions of integers quite fast in C++?

2 Answers2