13

Scatter-gather - readv()/writev()/preadv()/pwritev() - reads/writes a variable number of iovec structs in a single system call. Basically it reads/write each buffer sequentially from the 0th iovec to the Nth. However according to the documentation it can also return less on the readv/writev calls than was requested. I was wondering if there is a standard/best practice/elegant way to handle that situation.

If we are just handling a bunch of character buffers or similar this isn't a big deal. But one of the niceties is using scatter-gather for structs and/or discrete variables as the individual iovec items. How do you handle the situation where the readv/writev only reads/writes a portion of a struct or half of a long or something like that.

Below is some contrived code of what I am getting at:

int fd;

struct iovec iov[3];

long aLong = 74775767;
int  aInt  = 949;
char aBuff[100];  //filled from where ever

ssize_t bytesWritten = 0;
ssize_t bytesToWrite = 0;

iov[0].iov_base = &aLong;
iov[0].iov_len = sizeof(aLong);
bytesToWrite += iov[0].iov_len;

iov[1].iov_base = &aInt;
iov[1].iov_len = sizeof(aInt);
bytesToWrite += iov[1].iov_len;

iov[2].iov_base = &aBuff;
iov[2].iov_len = sizeof(aBuff);
bytesToWrite += iov[2].iov_len;

bytesWritten = writev(fd, iov, 3);

if (bytesWritten == -1)
{
    //handle error
}

if (bytesWritten < bytesToWrite)
    //how to gracefully continue?.........
pynexj
  • 19,215
  • 5
  • 38
  • 56
ValenceElectron
  • 2,678
  • 6
  • 26
  • 27

3 Answers3

15

Use a loop like the following to advance the partially-processed iov:

for (;;) {
    written = writev(fd, iov+cur, count-cur);
    if (written < 0) goto error;
    while (cur < count && written >= iov[cur].iov_len)
        written -= iov[cur++].iov_len;
    if (cur == count) break;
    iov[cur].iov_base = (char *)iov[cur].iov_base + written;
    iov[cur].iov_len -= written;
}

Note that if you don't check for cur < count you will read past the end of iov which might contain zero.

jap
  • 53
  • 5
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Nice, R, that's right along the lines I was looking for. Pointer arithmetic to the rescue. Is it your opinion that short readv/writev outcomes are a common rather than exception condition, as in standard read/write and this is something which always needs to be coded for? – ValenceElectron May 03 '11 at 14:01
  • 3
    Yes. With pipes, sockets, or ttys in nonblocking mode, short writes are the norm. But even in other situations, they may be unlikely but still a real possibility. Even if `SA_RESTART` is used for the signal handler, if `writev` is interrupted by a signal, it will only get restarted if it had not yet written anything. If a partial write was already completed, it will return the short write. Even if you don't have any signal handlers, the process being stopped and resumed with `SIGSTOP` (non-blockable) and `SIGCONT` will have the same effects. – R.. GitHub STOP HELPING ICE May 03 '11 at 14:39
  • BTW, I just fixed my code. I had forgotten `iov_base` has type `void *` and thus arithmetic on it is invalid. The new version casts to `char *` to perform the arithmetic. – R.. GitHub STOP HELPING ICE May 03 '11 at 14:45
  • This is doing syscalls in for loop. That's not optimal. The asnwer here: https://stackoverflow.com/questions/45227781/how-to-deal-if-writev-only-write-part-of-the-data gives better solution i think. – clime Jul 26 '19 at 14:11
  • @clime: It is optimal because there is no way to do it with fewer syscalls than the number imposed by the extent of the shortness of individual reads/writes. Of course you need a loop when you can't control the shortness. – R.. GitHub STOP HELPING ICE Jul 26 '19 at 16:13
  • @clime: Moreover the answer you linked is wrong in claiming that short writes can't arise for a blocking fd, and otherwise saying the same thing my answer here does except not actually writing out the loop (just saying effectively "retry with..."). – R.. GitHub STOP HELPING ICE Jul 26 '19 at 16:15
  • @R: ok, sry, i didn't read the code properly. I now see you are jumping over multiple iovec elements in the inner loop. – clime Jul 26 '19 at 18:15
2

AFAICS the vectored read/write functions work the same wrt short reads/writes as the normal ones. That is, you get back the number of bytes read/written, but this might well point into the middle of a struct, just like with read()/write(). There is no guarantee that the possible "interruption points" (for lack of a better term) coincide with the vector boundaries. So unfortunately the vectored IO functions offer no more help for dealing with short reads/writes than the normal IO functions. In fact, it's more complicated since you need to map the byte count into an IO vector element and offset within the element.

Also note that the idea of using vectored IO for individual structs or data items might not work that well; the max allowed value for the iovcnt argument (IOV_MAX) is usually quite small, something like 1024 or so. So if you data is contiguous in memory, just pass it as a single element rather than artificially splitting it up.

janneb
  • 36,249
  • 2
  • 81
  • 97
-2

Vectored write will write all the data you have provided with one call to "writev" function. So byteswritten will be always be equal to total number of bytes provided as input. this is what my understanding is.

Please correct me if I am wrong

Arunmu
  • 6,837
  • 1
  • 24
  • 46
  • If the medium you are writing to runs out of space before all the data is written, the bytes written will be fewer than the bytes requested. – Jonathan Leffler May 02 '11 at 05:57
  • For that, according to unix man pages you should get "EDQUOT" error. – Arunmu May 02 '11 at 06:02
  • It might be as you say in which case my question is more or less moot. Logically that would make sense but the man page doesn't seem to be definitive. In addition to Jonathan's case, on the reading end I imagine it is possible to come up short if you are reading a file that another process is writing but maybe that is not a good use of this technique. – ValenceElectron May 02 '11 at 06:22
  • 1
    See: http://pubs.opengroup.org/onlinepubs/9699919799/functions/writev.html for POSIX's rules on `writev()`; `pwritev()` is not in POSIX. POSIX does not mention EDQUOT; when there is simply no space left you should get ENOSPC. **However**, I suspect you are arguing that you do not get a short write indication; you get an error indication. POSIX `write()` page says _'If a write() requests that more bytes be written than there is room for (for example, the physical end of a medium), only as many bytes as there is room for shall be written.'_ followed by an example of when a short write occurs. – Jonathan Leffler May 02 '11 at 06:33
  • Just curious, the POSIX spec states that, _"The writev() function shall always write a complete area before proceeding to the next."_ Does that mean that `writev()` won't bail in the middle of an iovec, i.e., it will, if it has to exit early, only exit after a a whole iovec has been written, or else return an error? – Jason May 02 '11 at 17:01
  • 1
    @Jason: No, it simply means that it won't proceed to `iov[n+1]` without having fully written `iov[n]`. It can still return a short read without having fully written `iov[n]`. – R.. GitHub STOP HELPING ICE May 02 '11 at 19:32