34

I am using GCC 4.8 and glibc 2.19 on an x86_64 Linux.

While playing with different input methods for a different question, I compared fscanf and sscanf. Specifically, I would either use fscanf on the standard input directly:

char s[128]; int n;

while (fscanf(stdin, "%127s %d", s, &n) == 2) { }

Or I would first read the entire input into a buffer and then traverse the buffer with sscanf. (Reading everything into the buffer takes a tiny amount of time.)

char s[128]; int n;
char const * p = my_data;

for (int b; sscanf(p, "%127s %d%n", s, &n, &b) == 2; p += b) { }

To my surprise, the fscanf version is vastly faster. For example, processing seveal tens of thousands of lines with fscanf takes this long:

10000       0.003927487 seconds time elapsed
20000       0.006860206 seconds time elapsed
30000       0.007933329 seconds time elapsed
40000       0.012881912 seconds time elapsed
50000       0.013516816 seconds time elapsed
60000       0.015670432 seconds time elapsed
70000       0.017393129 seconds time elapsed
80000       0.019837480 seconds time elapsed
90000       0.023925753 seconds time elapsed

Now the same with sscanf:

10000       0.035864643 seconds time elapsed
20000       0.127150772 seconds time elapsed
30000       0.319828373 seconds time elapsed
40000       0.611551668 seconds time elapsed
50000       0.919187459 seconds time elapsed
60000       1.327831544 seconds time elapsed
70000       1.809843039 seconds time elapsed
80000       2.354809588 seconds time elapsed
90000       2.970678416 seconds time elapsed

I was using the Google perf tools to measure this. For example, for 50000 lines, the fscanf code requires about 50M cycles, and the sscanf code about 3300M cycles. So I broke down the top call sites with perf record/perf report. With fscanf:

 35.26%  xf  libc-2.19.so         [.] _IO_vfscanf
 23.91%  xf  [kernel.kallsyms]    [k] 0xffffffff8104f45a
  8.93%  xf  libc-2.19.so         [.] _int_malloc

And with sscanf:

 98.22%  xs  libc-2.19.so         [.] rawmemchr
  0.68%  xs  libc-2.19.so         [.] _IO_vfscanf
  0.38%  xs  [kernel.kallsyms]    [k] 0xffffffff8104f45a

So almost all of the time with sscanf is spent in rawmemchr! Why is this? How can the fscanf code avoid this cost?

I tried searching for this, but the best I could come up with is this discussion of locked realloc calls which I don't think applies here. I was also thinking that fscanf has better memory locality (using the same buffer over and over), but that can't make such a big difference.

Does anyone have any insights in this strange discrepancy?

Community
  • 1
  • 1
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • Complete code for: [`fscanf`](https://ideone.com/sIr8lN), [`sscanf`](http://ideone.com/R8SQeJ) – Kerrek SB May 29 '14 at 00:27
  • I have trouble finding the source code for `_IO_vfscanf`. [This](http://www.opensource.apple.com/source/gcc/gcc-934.3/libio/iovfscanf.c) is the best I could find, but that isn't necessarily glibc 2.19. – Kerrek SB May 29 '14 at 00:31
  • 2
    Show the loop processing - it looks like you have a ["Schlemiel the Painter" problem](http://www.joelonsoftware.com/articles/fog0000000319.html). – Michael Burr May 29 '14 at 00:32
  • @MichaelBurr: I linked the test code, and have posted the loops in the question. Do you think `sscanf` scans to the end of the string each time? That would contradict the value stored in `b`, which has the expected value (i.e. one line of input is consumed in each call). – Kerrek SB May 29 '14 at 00:33
  • 1
    @MichaelBurr: Actually, I think Michael Burr is right, it appears that `sscanf` is searching the _entire file_ for the trailing null, and then parsing out the three variables you want. Look at the example on http://linux.die.net/man/3/rawmemchr – Mooing Duck May 29 '14 at 00:49
  • @MichaelBurr: He does, but the Schlemiel the Painter Problem seems to be in glibc itself, not in OP's code. – R.. GitHub STOP HELPING ICE May 29 '14 at 01:00

2 Answers2

34

sscanf() converts the string you pass in to an _IO_FILE* to make the string look like a "file". This is so the same internal _IO_vfscanf() can be used for both a string and a FILE*.

However, as part of that conversion, done in a _IO_str_init_static_internal() function, it calls __rawmemchr (ptr, '\0'); essentially a strlen() call, on your input string. This conversion is done on every call to sscanf(), and since your input buffer is rather large, it'll spend a fair amount of time calculating the length of the input string.

Creating a FILE* from the input string using fmemopen() and use fscanf() could be another alternative.

nos
  • 223,662
  • 58
  • 417
  • 506
  • 16
    I would suggest filing a bug report against glibc. This issue could definitely be fixed in principle by making the virtual `FILE` provided by `sscanf` use custom ops that don't require advance knowledge of the string length. Actually our implementation in musl libc avoids the issue, so I know it's possible. :-) – R.. GitHub STOP HELPING ICE May 29 '14 at 02:55
  • @R..: I had never heard of musl before -- thanks for pointing it out! – Kerrek SB May 29 '14 at 19:32
8

It looks like glibc's sscanf() scans the source string for the length before doing anything else.

sscanf() (in stdio-common/sscanf.c) is essentially a wrapper around a call to _IO_vsscanf() (in libio/iovsscanf.c). And one of the first things that _IO_vsscanf() does is initialize its own _IO_strfile structure by calling _IO_str_init_static_internal() (in libio/strops.c) which calculates the length of the string if it's not provided.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760