Syscall overhead

Question

How big is (approximately) an I/O syscall overhead on Linux from C program, I mean how bad is running e.g. many small read / write operations compared with read / write on large buffers (on regular files or network sockets)? App is strongly multithreaded.

I/O is usually one of, if not the, slowest parts of a program. — Seth Carnegie, Nov 23 '11 at 18:35
@GregHewgill, for perf overhead, most people use time. E.g. it's 2ms bad. I suppose there is some existential debate on the nature of time, but that's what I would use. — Paul Draper, Aug 14 '17 at 00:34

score 20 · Accepted Answer · answered Nov 23 '11 at 18:37

20

Syscalls take at least 1-2 microseconds on most modern machines just for the syscall overhead, and much more time if they're doing anything complex that could block or sleep. Expect at least 20 microseconds and up to the order of milliseconds for IO. Compare this with a tiny function call or macro that reads a byte from a userspace buffer, which is likely to complete in a matter of nanoseconds (maybe 200 ns on a bad day).

answered Nov 23 '11 at 18:37

R.. GitHub STOP HELPING ICE

208,859
35
376
711

6

+1, Thank you for not being all mystical and nihilist about performance issues like so many are. – Seth Carnegie Nov 23 '11 at 18:44
Yes, it's not the `syscall` overhead you have to worry about -- it's the work that the system call does that's slow! – Gabe Nov 23 '11 at 18:58
2

@Gabe: It's not entirely that simple, at least not in all cases. For performing actual IO, the work is moderately expensive, but for multiplexing IO (`select`), timing (`nanosleep`), signal control (`sigprocmask`), synchronization objects (`futex`), and many other cases, the overall time is actually dominated by syscall overhead and not by any work being done. In fact for many purposes, the constant factor from syscall overhead is so large that it's practical to only count syscalls (i.e. consider everything else `O(0)`) in figuring how time cost scales with real-world-size data. – R.. GitHub STOP HELPING ICE Nov 23 '11 at 23:59
3

@Gabe: In 2004 I sped up gnome-terminal startup from over a second to under 100 ms just be eliminating the 4096 close() syscalls that it did. In fact, between the fork/exec of the launcher and the program start it did them twice. My patch didn't get taken in, they ended up doing something a bit fancier. – Zan Lynx Jan 20 '12 at 18:16
5

@Zan: A really cool trick if you need to close all file descriptors but don't want to waste so much time trying `close` on every possible one: the `poll` interface can probe an arbitrarily-long list of file descriptors and tell you which ones are valid and which aren't, all with a single syscall. Then you only have to `close` the ones `poll` determined were valid. – R.. GitHub STOP HELPING ICE Jan 22 '12 at 02:18
@R.. That's a good one. The Linux-only method many applications use now is to open /proc/pid/fds and close the open ones. – Zan Lynx Jan 22 '12 at 19:42
2

The Linux-only method is also a lot slower (many syscalls, including IO on `/proc` which is notoriously slow, instead of just one). – R.. GitHub STOP HELPING ICE Jan 22 '12 at 19:47
@R.. does syscall overhead of 1-2 microseconds include both overheads for entering kernel and exiting to user space? – incompetent Apr 01 '16 at 13:59
@R..: Did you ever measure this? I get 62 ns on my machine, not 1-2 us. – user541686 Mar 17 '18 at 07:59
@Mehrdad: If you get 62ns you're not making a syscall. You're probably calling a function that you expect to be a syscall, but that ends up returning something accessible-from or cached-in userspace, like getpid or time. Lowest cycle overhead I've ever seen for syscalls (according to rdtsc) is about 600 cycles, and you'd need about 9-10 GHz to do that in 62 ns. – R.. GitHub STOP HELPING ICE Mar 17 '18 at 15:39
@R..: Maybe clarify in your answer what you mean by "*just for the syscall overhead*"? I landed here when I did some Googling and I thought you meant a `syscall` is 1-2 microseconds... – user541686 Mar 18 '18 at 03:15
1

_How long does it take to make a context switch?_ is a detailed benchmark of Linux that provides C code for replication: https://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html – Lassi Jul 21 '19 at 07:16

score 4 · Answer 2 · answered Nov 23 '11 at 18:50

4

You can measure this yourself. Just open /dev/zero and do some reading and writing while measuring the time. Also vary the number of bytes you put into each call - e.g. 1 bytes, 2 bytes, 128 bytes, .. 4096bytes. Also take care to use the read(2) and write(2) syscalls and not anything using internal buffers.

answered Nov 23 '11 at 18:50

A.H.

63,967
15
92
126

While this is a good idea, `/dev/zero` does not include any actual I/O so it may not give accurate answers. – Gabe Nov 23 '11 at 18:57
4

@Gabe: That's exactly the point. James wanted to know the I/O _syscall overhead_, not the time for the actual data transfer to the disks. Doing the I/O on a RAM based block device gives a very good aproximation of this overhead. – A.H. Nov 23 '11 at 19:05
To illustrate my point, if you do lots of writes to a network socket, it may send many small packets rather than fewer large ones. This could cause congestion on the network and slow things down. If that were the case, you'd never discover it by just writing to `/dev/zero`. – Gabe Nov 23 '11 at 19:45
3

Actually `/dev/zero` is **slower** than real file IO, in my experience, asuming the real files all fit in cache. – R.. GitHub STOP HELPING ICE Nov 24 '11 at 00:00

Syscall overhead

2 Answers2

Linked