11

I am writing an application server that processes images (large data). I am trying to minimize copies when sending image data back to clients. The processed images I need to send to clients are in buffers obtained from jemalloc. The ways I have thought of sending the data back to the client is:

1) Simple write call.

// Allocate buffer buf.
// Store image data in this buffer.
write(socket, buf, len);

2) I obtain the buffer through mmap instead of jemalloc, though I presume jemalloc already creates the buffer using mmap. I then make a simple call to write.

buf = mmap(file, len);  // Imagine proper options.
// Store image data in this buffer.
write(socket, buf, len);

3) I obtain a buffer through mmap like before. I then use sendfile to send the data:

buf = mmap(in_fd, len);  // Imagine proper options.
// Store image data in this buffer.
int rc;
rc = sendfile(out_fd, file, &offset, count);
// Deal with rc.

It seems like (1) and (2) will probably do the same thing given jemalloc probably allocates memory through mmap in the first place. I am not sure about (3) though. Will this really lead to any benefits? Figure 4 on this article on Linux zero-copy methods suggests that a further copy can be prevented using sendfile:

no data is copied into the socket buffer. Instead, only descriptors with information about the whereabouts and length of the data are appended to the socket buffer. The DMA engine passes data directly from the kernel buffer to the protocol engine, thus eliminating the remaining final copy.

This seems like a win if everything works out. I don't know if my mmaped buffer counts as a kernel buffer though. Also I don't know when it is safe to re-use this buffer. Since the fd and length is the only thing appended to the socket buffer, I assume that the kernel actually writes this data to the socket asynchronously. If it does what does the return from sendfile signify? How would I know when to re-use this buffer?

So my questions are:

  1. What is the fastest way to write large buffers (images in my case) to a socket? The images are held in memory.
  2. Is it a good idea to call sendfile on a mmapped file? If yes, what are the gotchas? Does this even lead to any wins?
Rajiv
  • 2,587
  • 2
  • 22
  • 33

2 Answers2

5

It seems like my suspicions were correct. I got my information from this article. Quoting from it:

Also these network write system calls, including sendfile, might and in many cases do return before the data sent over TCP by the method call has been acknowledged. These methods return as soon as all data is written into the socket buffers (sk buff) and is pushed to the TCP write queue, the TCP engine can manage alone from that point on. In other words at the time sendfile returns the last TCP send window is not actually sent to the remote host but queued. In cases where scatter-gather DMA is supported there is no seperate buffer which holds these bytes, rather the buffers(sk buffs) just hold pointers to the pages of OS buffer cache, where the contents of file is located. This might lead to a race condition if we modify the content of the file corresponding to the data in the last TCP send window as soon as sendfile is returned. As a result TCP engine may send newly written data to the remote host instead of what we originally intended to send.

Provided the buffer from a mmapped file is even considered "DMA-able", seems like there is no way to know when it is safe to re-use it without an explicit acknowledgement (over the network) from the actual client. I might have to stick to simple write calls and incur the extra copy. There is a paper (also from the article) with more details.

Edit: This article on the splice call also shows the problems. Quoting it:

Be aware, when splicing data from a mmap'ed buffer to a network socket, it is not possible to say when all data has been sent. Even if splice() returns, the network stack may not have sent all data yet. So reusing the buffer may overwrite unsent data.

Barmar
  • 741,623
  • 53
  • 500
  • 612
Rajiv
  • 2,587
  • 2
  • 22
  • 33
  • If you're using sendfile() there is no reason to use mmap() at all. Just open() the file to get an FD and pass it to sendfile(). – user207421 Nov 16 '13 at 21:35
  • My data is in memory. I don't have a file per se. I was attempting to use sendfile on a mmapped file to reduce the number of copies made while writing my in-memory data (buffer from mmapped file) to a socket. – Rajiv Nov 16 '13 at 21:53
  • 1
    An mmapped file is a file. An mmapped buffer is not a kernel buffer. – user207421 Nov 16 '13 at 22:32
  • 2
    @EJP please look at the Lighttpd experiment on the paper I linked to in this answer. A mmapped file can be treated as a buffer (shared between the kernel and the user) that is "DMA-able" if the NIC supports scatter gather DMA. Such a NIC can pull data straight from the memory and write it onto the wire without any copies. The linux journal article in the original question shows how sendfile achieves this. – Rajiv Nov 16 '13 at 22:40
1

For cases 1 and 2 - does the operation you marked as // Store image data in this buffer require any conversion? Is it just plain copy from the memory to buf?

If it's just plain copy, you can use write directly on the pointer obtained from jemalloc.

Assuming that img is a pointer obtained from jemalloc and size is a size of your image, just run following code:

int result;
int sent=0;
while(sent<size) {
    result=write(socket,img+sent,size-sent);
    if(result<0) {
        /* error handling here */
        break;
    }
    sent+=result;
}

It is working correctly for blocking I/O (the default behavior). If you need to write a data in a non-blocking manner, you should be able to rework the code on your own, but now you have the idea.

For case 3 - sendfile is for sending data from one descriptor to another. That means you can, for example, send data from file directly to tcp socket and you don't need to allocate any additional buffer. So, if the image you want to send to a client is in a file, just go for a sendfile. If you have it in memory (because you processed it somehow, or just generated), use the approach I mentioned earlier.

ArturFH
  • 1,697
  • 15
  • 28
  • I omitted the error checking for brevity. I am actually writing it in a non-blocking way and am writing the image straight into the buffer from jemalloc. I definitely have the image in memory. What I wanted to know is if I have a mmapped buffer, do I benefit from the sendfile command's improved performance. Asked alternatively is my mmapped buffer considered "DMA-able" if I use a sendfile command on the underlying file? – Rajiv Nov 16 '13 at 20:59
  • 1
    @Rajiv Please correct me, if I am wrong, I'm trying to understand your problem. You have a file on disk. You open this file and you have it's file descriptor. Next you mmap this file to memory. What for? After that you call a sendfile giving this file descriptor as a 2nd argument of sendfile. And after that you want to modify the file by writing to the mmapped memory. Am I right? – ArturFH Nov 16 '13 at 21:43
  • @ Artur R. Czechowski I mmap a bogus file just as a way to get a buffer that is backed by a file. My aim was to then use sendfile on this file to actually transmit my in-memory data without any copies on NICs that support scatter-gather DMA. The question is if my NIC supports scatter-gather DMA, when is it safe to re-use this mmapped memory? – Rajiv Nov 16 '13 at 21:56
  • @Rajiv One more question: do you call mmap with MAP_ANONYMOUS flag? – ArturFH Nov 16 '13 at 22:09
  • No I don't call it with the MAP_ANONYMOUS flag. I need a real fd, that I can use with sendfile. There is a real file, but the aim of the program is NOT disk back-up. I am only using this to get a shared (with the kernel) buffer that I can write my image data into.I use sendfile on the underlying file to write the contents of my shared buffer to the socket. This seems like it does happen. The problem is this is that these writes are async and I have no way of knowing when it is safe to re-use this shared buffer. Look at my answer and the links there for more detail. – Rajiv Nov 16 '13 at 22:24