how to get the minimum amount of bytes to read() from a file descriptor?

Question

Edit

My primary goal was to just flush a readable file descriptor after select() notified incoming data. This goal is achieved for me now by just providing read() with a big enough buffer as pointed out by Basile Starynkevitch. This is why I mark this answer accepted.

The question in the title is not answered yet though: how do I get the minimum number of bytes I can read from a file descriptor like this:

min_size = fd_get_chunksize(file_descriptor);

which might return 1, 4, 8 or something else.

Original Question

I have a couple of file descriptors created in different ways. E.g. with timerfd_create() and configured it to fire once a second.

When select() signals traffic on a certain FD I want to flush it. For the ones created with timerfd_create() I have to read 8 bytes minimum:

if(select(fd + 1, &l_fdsRd, NULL, NULL, &l_timeOut)) {
    unsigned long long data;
    int count;
    while((count = read (fd, &data, sizeof(data))) > 0) {
        printf("%d %ld\n", count, data);
    }
}

When data is declared as char and thus sizeof(data) is 1, count is always 0 and my file descriptor never gets flushed.

In case I have more than one file descriptor to flush (maybe created differently) I have to know the number of bytes for every file descriptor I have to read to flush it.

Is there a way to get this amount of bytes for an existing FD?

Is there another way to flush a file descriptor I've created with timerfd_create()? (I read Empty or "flush" a file descriptor without read()? but this gave me no answer..) Actually I don't want to read the content but just want to make it ready for select() again.

This would appear to be C++, not C...may apply to both languages, but please update your tags accordingly. — tonysdg, Dec 01 '15 at 18:00
IMHO, `timerfd_create` should have been mentioned in the title! — Basile Starynkevitch, Dec 01 '15 at 18:06
Maybe my post was misleading. FDs created with `timerfd_create` are just one sort of file descriptors I want to monitor. After `select()` returned on one of them I have to flush it. Since I want to do this generically I have to know *how* to flush them. E.g. I have to know how many bytes to read. — frans, Dec 01 '15 at 18:18
@tonysdg: It's a C question. I just forgot to remove the `std::cout`. But `timerfd_create`, `select()`, `read()` etc. are not C++ — frans, Dec 01 '15 at 18:25
Did you consider the FIONBIO ioctl(2) ? I don't recommend it, but it probably is giving the number of readable bytes you are dreaming about. BTW, perhaps you just want non-blocking IO or async IO à la [aio_read(3)](http://man7.org/linux/man-pages/man3/aio_read.3.html) ... you really should motivate your question (why do you need to flush input) and explain *why you are asking it* by editing it to improve it — Basile Starynkevitch, Dec 02 '15 at 13:45

score 1 · Accepted Answer · edited May 23 '17 at 12:15

Read carefully timerfd_create(2)

Operating on a timer file descriptor

  The file descriptor returned by timerfd_create() supports the
   following operations:

  read(2)
          If the timer has already expired one or more times since its
          settings were last modified using timerfd_settime(), or since
          the last successful read(2), then the buffer given to read(2)
          returns an unsigned 8-byte integer (uint64_t) containing the
          number of expirations that have occurred.  (The returned value
          is in host byte order—that is, the native byte order for
          integers on the host machine.)

         If no timer expirations have occurred at the time of the
          read(2), then the call either blocks until the next timer
          expiration, or fails with the error EAGAIN if the file
          descriptor has been made nonblocking (via the use of the
          fcntl(2) F_SETFL operation to set the O_NONBLOCK flag).

         A read(2) will fail with the error EINVAL if the size of the
          supplied buffer is less than 8 bytes.

  poll(2), select(2) (and similar)
          The file descriptor is readable (the select(2) readfds
          argument; the poll(2) POLLIN flag) if one or more timer
          expirations have occurred.

So you really should read an unsigned 8 byte integer when the file descriptor is readable. Notice that you cannot read only a single byte (EINVAL error mentioned)

Hence declare

uint64_t data;

For ordinary file descriptors, you know how many bytes should you read. Perhaps it is a pipe or a socket (or a mouse device) with small fixed length messages. But in general, you'll better read a large enough buffer (typically, several kilobytes, up to a megabyte; perhaps 64Kbytes = 65536 bytes could be a not-too-bad tradeoff). Notice that read(2) is returning on success a byte count so can be a partial read. If some bytes remain immediately readable, the next poll(2) (or the nearly obsolete select) will succeed immediately.

See also the paragraph about Pipe Capacity in pipe(7)

You might also consider the old FIONBIO ioctl(2) but I don't recommend using it (it is not very portable, with a not very well defined semantics : it might give the number of available bytes to read). See this.

Avoid read(2)-ing very small buffers (of a few bytes). In particular, read-ing one byte at a time is always giving abyssal performance.

BTW, some hardware block devices may want to read(2) in multiple of some block size, which generally fits in a few kilobytes (e.g. a page or two). YMMV.

Perhaps asynchronous IO (see aio(7)) might be useful.

This is right but not the answer to my question. I've updated the title - maybe it was misleading. I need to know _how many_ bytes to read for an _arbitrary file descriptor_. — frans, Dec 01 '15 at 18:20
What file descriptor? You know how many bytes should you read! — Basile Starynkevitch, Dec 01 '15 at 18:45
No I don't (yet). I have a function which takes an arbitrary number of file descriptors I want to monitor using `select()` and flush afterwards. At this place I don't now how they were created. It might be pipes, timers or BSD sockets. Maybe I have to provide in integer for each indicating how many bytes I have to read at minimum to flush them. But I'd like to avoid this. — frans, Dec 01 '15 at 18:58
That's not an option: this way `read()` would either block or return `0` again in case there's not enough data. I want to implement a function `flush(FD)` which takes a file descriptor and delete _all_ data in a _non blocking_ way. — frans, Dec 01 '15 at 19:02
No, `read` can be a partial read, it won't block if *some* bytes have been read — Basile Starynkevitch, Dec 01 '15 at 19:03
Actually not always. I tried to read byte wise from a FD created with `timerfd_create()` which had 8 bytes of data in it. `read()` then returns 0 for me and the data remains. — frans, Dec 01 '15 at 19:06
But I did suggest using a reasonably sized buffer (of several kilobytes). — Basile Starynkevitch, Dec 01 '15 at 19:06
Maybe I misunderstood your point. Currently I cannot check it but providing a larger array sounds pretty plausible (maybe I've been working too long and did't see the point :)). I'll accept your anwer then. — frans, Dec 01 '15 at 19:14

how to get the minimum amount of bytes to read() from a file descriptor?

1 Answers1