Faster than scanf?

Question

I was doing massive parsing of positive integers using scanf("%d", &someint). As I wanted to see if scanf was a bottleneck, I implemented a naive integer parsing function using fread, just like:

int result;
char c;

while (fread(&c, sizeof c, 1, stdin), c == ' ' || c == '\n')
    ;

result = c - '0';
while (fread(&c, sizeof c, 1, stdin), c >= '0' || c <= '9') {
     result *= 10;
     result += c - '0';
}

return result;

But to my astonishment, the performance of this function is (even with inlining) about 50% worse. Shouldn't there be a possibility to improve on scanf for specialized cases? Isn't fread supposed to be fast (additional hint: Integers are (edit: mostly) 1 or 2 digits)?

`scanf`'s probably optimized by the compiler. `fread` is almost definitely *not* optimized to use STDIN as an input. — Ry-, Dec 12 '11 at 23:48
fread() is certainly not intended to read one character at a time. Presumably even fgetc() performs better. — wildplasser, Dec 12 '11 at 23:51
If you think the bottleneck might be text conversion, perhaps a better response is to try storing your data as binary rather than text. — asveikau, Dec 13 '11 at 00:12
@asveikau No sense in changing his storage format unless he *knows* that it's a bottleneck. — jforberg, Dec 13 '11 at 00:19
@jforberg - I agree, but that's a different question. If he is truly suspicious of it (or simply curious) he can try implementing it and compare results. — asveikau, Dec 13 '11 at 00:27
On a nit-picking point of view, I suggest using `isdigit()` instead of `c>='0' && c<='9'`, since digits may not be contiguous in every character set, e.g. EBCDIC. — Philip, Dec 13 '11 at 00:52
@Philip: Digits are consecutive in EBCDIC, even though letters are not. Special-purpose representations of alphanumeric data may use non-consecutive codings (e.g. Baudot code, used in telecommunications devices for the deaf, maps the digits 1234567890 to the same codes as letters QWERTYUIOP, which are mapped non-consecutively). I'm unaware of any code which would be used to store data in something one can "fread" from, which would not have digits consecutive. — supercat, Feb 20 '12 at 18:19
`isDigit()` is certainly faster, and not just better because it doesn't assume contiguity: it's faster because it's a single test rather than two, with no internal branch. — user207421, Jan 17 '15 at 02:01
For completeness, `isdigit` is implemented as `(unsigned)c-'0' < 10` in musl libc which is indeed only a single test. In the meantime I changed my test to `c >= '0'` which works for my purposes (only whitespace and digits guaranteed for ASCII input) and is even faster. But the main speedup comes from using `unlocked_stdio(3)`. — Jo So, Jan 17 '15 at 12:06
@Philip Inexact; The C Standards have _all_ stated, in 5.2.1 Character sets clause 3, _"In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous"_. It follows that `c>='0' && c<='9'` encompasses all decimal digits. — Iwillnotexist Idonotexist, Jan 17 '15 at 18:24

Jo So · Accepted Answer · 2015-01-18T00:44:47.347

The overhead I was encountering was not the parsing itself but the many calls to fread (same with fgetc, and friends). For each call, the libc has to lock the input stream to make sure two threads aren't stepping on each other's feet. Locking is a very costly operation.

What we're looking for is a function that gives us buffered input (reimplementing buffering is just too much effort) but avoids the huge locking overhead of fgetc.

If we can guarantee that there is only a single thread using the input stream, we can use the functions from unlocked_stdio(3), such as getchar_unlocked(3). Here is an example:

static int parseint(void)
{
    int c, n;

    n = getchar_unlocked() - '0';
    while (isdigit((c = getchar_unlocked())))
        n = 10*n + c-'0';

    return n;
}

The above version doesn't check for errors. But it's guaranteed to terminate. If error handling is needed it might be enough to check feof(stdin) and ferror(stdin) at the end, or let the caller do it.

My original purpose was submitting solutions to programming problems at SPOJ, where the input is only whitespace and digits. So there is still room for improvement, namely the isdigit check.

static int parseint(void)
{
    int c, n;

    n = getchar_unlocked() - '0';
    while ((c = getchar_unlocked()) >= '0')
        n = 10*n + c-'0';

    return n;
}

It's very, very hard to beat this parsing routine, both performance-wise and in terms of convenience and maintainability.

Timothy Jones · Answer 2 · 2011-12-13T00:08:36.103

4

You'll be able to improve significantly on your example by buffering - read a large number of characters into memory, and then parse them from the in-memory version.

If you're reading from disk you might get a performance increase by your buffer being a multiple of the block size.

Edit: You can let the kernel handle this for you by using mmap to map the file into memory.

edited Dec 13 '11 at 00:08

answered Dec 12 '11 at 23:51

Timothy Jones

21,495
6
60
90

1

I'd like to doubt that. I'm pretty sure that libc does buffering itself. Unless you're using system calls (read(2)/write(2)), implementing your own buffer is likely to slow things down I think... – Jo So Dec 13 '11 at 00:10
1

Give it a try - happy to be wrong :) I remember manual buffering dramatically speeding up code that used to be a getc(), BUT that might have been stdin, which isn't block buffered. You could also try giving `setvbuf()` a bigger buffer. – Timothy Jones Dec 13 '11 at 00:15
1

@JoSo: Just for kicks, I read a 372MB file with calls to fread with 1 byte at a time and it took nearly 30 seconds. Reading 4K at a time took 1.3 seconds. Of course, you have already figured this out. Nonetheless, accepting this answer would probably be good. – Mark Wilkins Dec 13 '11 at 00:22
This wasn't exactly an answer to my question, which was on beating scanf on a more narrow field (`scanf(%d, ...)` is not supposed to do buffering, either!). Anyways accepted because no one else would want to write an answer ;-) – Jo So Dec 13 '11 at 00:28
@JoSo Yes, stdio will buffer, but it is more likely that the bottleneck is simply the 372 million calls to fgetc rather than any side-effects of double buffering. – jørgensen Dec 13 '11 at 02:04

score 1 · Answer 3 · answered Jan 17 '15 at 01:45

1

Here's something I use.

 #define scan(x) do{while((x=getchar())<'0'); for(x-='0'; '0'<=(_=getchar()); x=(x<<3)+(x<<1)+_-'0');}while(0)
 char _;

However, this only works with Integers.

answered Jan 17 '15 at 01:45

Boris

805
1
8
20

1

This works great! Although, I suggest to use getcx() or getchar_unlocked() instead of getchar(). – diegocaro Dec 17 '15 at 14:42
1

What does `char _;` do? – Dec 30 '16 at 22:34
@light2yellow the `char _;` defines a variable whose name is the underscore character, `_`, which is used in the macro itself such as `(_=getchar())`. The intent is to have a global variable which is defined where the macro is defined that the macro then uses. I would probably use `static char _;` instead in order to limit the scope to the file or compilation unit. Probably not a good idea for multi-threaded applications. Really should be a function, not a macro. Reads input for first digit then converts a series of digits into an integer value. stops reading at first non-digit. – Richard Chambers Jul 14 '18 at 11:33

score -2 · Answer 4 · answered Dec 13 '11 at 00:37

From what you say, I derive the following facts:

numbers are in the range of 0-99, which accounts for 10+100 different strings (including leading zeros)
you trust that your input stream adheres to some sort of specification and won't contain any unexpected character sequences

In that case, I'd use a lookup table to convert strings to numbers. Given a string s[2], the index to your lookup table can be computed by s[1]*10 + s[0], swapping the digits and making use of the fact that '\0' equals 0 in ASCII.

Then, you can read your input in the following way:

// given our lookup method, this table may need padding entries
int lookup_table[] = { /*...*/ };

// no need to call superfluous functions
#define str2int(x) (lookup_table[(x)[1]*10 + (x)[0]])

while(read_token_from_stream(stdin, buf))
        next_int = str2int(buf);

On today's machines, it will be hard to come up with a faster technique. My guess is that this method will likely run 2 to 10 times faster than any scanf()-based approach.

Sorry, I meant "most" of the numbers are 1 or 2 digits. Anyway, where's the point in using a lookup table? I can't see where you are saving cycles. (You are saving the `c - '0'` part from my solution, but on the other hand you have more redirection) — Jo So, Dec 13 '11 at 00:43
Jo So: I'm also saving the `c >= '0' || c <= '9'` and `c == ' ' || c == '\n'` parts with at most three instructions each. But admittedly, a true subset of these operations is needed in what I abbreviated with `read_token_from_stream()`. — Philip, Dec 13 '11 at 00:49

Faster than scanf?

4 Answers4