I'm trying to use vmsplice
to replace write
when writing to a pipe because write
seems to have a huge slowdown through a pipe, in my computer about 0.1 times the speed when writing directly without a pipe. According to this post, write
is slow because it has to copy the buffer to the pipe while vmsplice
can do the same job copyless.
In the code, outw
and outv
is meant to do the same job. I wrote outv
the same way the author of the linked post wrote in assembly.
mov [%rip + iovec_base], OUTPUT_PTR
mov [%rip + iovec_base + 8], %rdx
mov ARG1e, 1
lea ARG2, [%rip + iovec_base]
mov ARG3e, 1
xor ARG4e, ARG4e
1: mov SYSCALL_NUMBER, __NR_vmsplice
syscall
call exit_on_error
add [ARG2], SYSCALL_RETURN
sub [ARG2 + 8], SYSCALL_RETURN
jnz 1b
This is my code. When running the code always guide the output through a pipe like ./a.out|cat
. Otherwise, vmsplice
will crash.
#define _GNU_SOURCE
#include <stdbool.h>
#include <stdalign.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#define S 0x10
alignas(0x1000) static char b[S];
void outw(int n) {
write(1, b, n);
}
void outv(int n) {
struct iovec iov = {b, n};
do {
if ((n = vmsplice(1, &iov, 1, 0)) < 0) abort();
iov.iov_base = (char *)iov.iov_base + n;
iov.iov_len -= n;
} while (iov.iov_len);
}
#define _(f, n) do {\
for (int i = 0; i < 3; ++i) {\
memset(b, i + '0', (n) - 1);\
b[(n) - 1] = '\n';\
f(n);\
}\
} while (false)
int main() {
_(outw, S);
_(outw, S - 1);
write(1, "---\n", 4);
_(outv, S);
_(outv, S - 1);
}
The expected output is,
000000000000000
111111111111111
222222222222222
00000000000000
11111111111111
22222222222222
---
000000000000000
111111111111111
222222222222222
00000000000000
11111111111111
22222222222222
but for the second part I get,
22222222222222
22222222222222
22222222222222
22222222222222
22222222222222
22222222222222
When I add this line to the first line of main
,
fcntl(1, F_SETPIPE_SZ, S);
the second output is a bit better, but still not good.
111111111111111
222222222222222
000000000000000
11111111111111
22222222222222
22222222222222
I tried matching the size of the buffer to be written to the pipe size by commenting out these lines.
//_(outw, S - 1);
//_(outv, S - 1);
Still, the top and the bottom doesn't match.
000000000000000
111111111111111
222222222222222
---
111111111111111
222222222222222
222222222222222
So what am I doing wrong, and how do I make outv
do the same job as outw
but without copying?
I kind of solved the problem by setting the buffer size to at least 0x10000
or 65536
and matching the pipe's size as the same. I'm not entirely sure, but it seems that nothing happens before the pipe is full, and when it is full, some routine that is handling the output assumes that it can copy from the same buffer for the previous calls to vmsplice
, not caring about whether the contents of the buffer has changed.
I thought I solved the problem, but it was not true. I still get unexpected output in the actual program where I did match the output buffer size and the pipe size to 0x100000
. All works fine with write
apart from the very slow speed, so the problem is in the way I'm using vmsplice
. The man page for this system call isn't clear on what exactly this call is doing and what can happen in what cases.