17

I'm wondering if there's an easy way to iterate through a fd_set? The reason I want to do this is to not having to loop through all connected sockets, since select() alters these fd_sets to only include the ones I'm interested about. I also know that using an implementation of a type that is not meant to be directly accessed is generally a bad idea since it may vary across different systems. However, I need some way to do this, and I'm running out of ideas. So, my question is:

How do I iterate through an fd_set? If this is a really bad practice, are there any other ways to solve my "problem" except from looping through all connected sockets?

Thanks

Thanatos
  • 42,585
  • 14
  • 91
  • 146
Andreas
  • 541
  • 2
  • 8
  • 16
  • To emphasize what I mean. I do not want to use the FD_ISSET approach since it requires me to loop through all connected sockets. But since, by definition, select() removes non-relevant file descriptors from the set, I want to loop through the set. – Andreas Sep 07 '10 at 18:26
  • 6
    It doesn't necessarily mean "all connected". You can pass a subset of your connected sockets to select and then use FD_ISSET on only that subset after select returns. Also, is there an actual problem with looping over all of them? Unless you're dealing with many thousands of connected sockets, the loop will likely take an inconsequential amount of time. – Rakis Sep 07 '10 at 18:34
  • 1
    Agree with Rakis. This is one of those things that seems to be inefficient but in most cases really isn't The time to go through the loop will be dwarfed by the time it takes to service just one of the set FDs. – Duck Sep 07 '10 at 18:50
  • There are going to be a substantial amount of connected sockets, therefore it just seems bad to loop through all of them. I can't find it atm but I'm pretty sure I've read select() will remove from the fd_set I pass in the sockets which are not ready to read from when returning. That is the functionality I'm looking for. (Perhaps I'm miss remembering, since I can't find it atm.) – Andreas Sep 07 '10 at 18:52
  • Sure but let's say you have 10,000 connections and 5000 have been set after select(). What's the cost of 5k (unset) bit tests versus what it going to take to read/write/whatever the 5000 that were set? – Duck Sep 07 '10 at 19:00
  • Yes, I do see your point. BUt let's say we have 10 000 open connetions, and select() returns after only 20 or so needs to be processed. Then we have an overlay of alot of connections we need to loop through. Surely there must be a more efficient way? – Andreas Sep 07 '10 at 19:03
  • Please see my answer once.According to me, any alternative method will be even worse with respect to efficiency. – lalli Sep 07 '10 at 19:08
  • 1
    @Andreas: 10000 open connections and all transactions are handled serially? – Potatoswatter Sep 07 '10 at 19:08
  • iterate.. equals looping, right? So you're asking "How do I loop, without looping?". What is the goal here? I do not understand this, because -- You don't want to use FD_ISSET .. but since select removes .. file descriptors .. you want to loop through the set? -- Also, I agree with Rakis and Duck here. – default Sep 07 '10 at 19:12
  • @Potatoswatter: The current model works like this: All connected sockets are waited serially by the same select(), and then their requests rather than their connections get passed onto different threads. Perhaps not the best model, nonetheless I still feel like what I want to do should be doable? – Andreas Sep 07 '10 at 19:15
  • Michael: Indeed iterating and looping refers to the same thing. I'm not asking how to loop without looping. I'm asking how I can loop through the fd_set that is returned from select as an out parameter, rather than looping through my std::set of connected sockets and calling FD_ISSET() to see if they are indeed in there. – Andreas Sep 07 '10 at 19:17
  • 2
    @Andreas: What you have here is called a bottleneck. There are other things that can go wrong besides FD_ISSET being to slow, to say the least. Just divide connections among several dispatcher threads. – Potatoswatter Sep 07 '10 at 19:21
  • @Andreas: you really shouldn't be using `select()` - but either `poll()` or a wrapper library like [libevent](http://monkey.org/~provos/libevent/). Otherwise, `man ffs` - `fd_set` is a plain bitmap on all systems (except Windows). – Dummy00001 Sep 07 '10 at 22:15
  • @Andreas: aah! Now I understand :) Although, I still think you are looking at the wrong end.. I believe that you can run your program through a profiler too find bottlenecks. I do not think this loop will show up as one of the big ones (but do correct me if I'm wrong :) : http://stackoverflow.com/questions/67554/whats-the-best-free-c-profiler-for-windows-if-there-are – default Sep 08 '10 at 09:44
  • http://stackoverflow.com/questions/2589096/find-most-significant-bit-left-most-that-is-set-in-a-bit-array might be useful, to skip useless loops. Uses __builtin_clz which might have a corresponding ASM-Ccall on your machine. – MaPePeR Apr 22 '13 at 10:35

8 Answers8

12

You have to fill in an fd_set struct before calling select(), you cannot pass in your original std::set of sockets directly. select() then modifies the fd_set accordingly, removing any sockets that are not "set", and returns how many sockets are remaining. You have to loop through the resulting fd_set, not your std::set. There is no need to call FD_ISSET() because the resulting fd_set only contains "set" sockets that are ready, eg:

fd_set read_fds;
FD_ZERO(&read_fds);

int max_fd = 0;

read_fds.fd_count = connected_sockets.size();
for( int i = 0; i < read_fds.fd_count; ++i ) 
{
    read_fds.fd_array[i] = connected_sockets[i];
    if (read_fds.fd_array[i] > max_fd)
      max_fd = read_fds.fd_array[i];
}

if (select(max_fd+1, &read_fds, NULL, NULL, NULL) > 0)
{ 
    for( int i = 0; i < read_fds.fd_count; ++i ) 
        do_socket_operation( read_fds.fd_array[i] ); 
} 

Where FD_ISSET() comes into play more often is when using error checking with select(), eg:

fd_set read_fds;
FD_ZERO(&read_fds);

fd_set error_fds;
FD_ZERO(&error_fds);

int max_fd = 0;

read_fds.fd_count = connected_sockets.size();
for( int i = 0; i < read_fds.fd_count; ++i ) 
{
    read_fds.fd_array[i] = connected_sockets[i];
    if (read_fds.fd_array[i] > max_fd)
      max_fd = read_fds.fd_array[i];
}

error_fds.fd_count = read_fds.fd_count;
for( int i = 0; i < read_fds.fd_count; ++i ) 
{
    error_fds.fd_array[i] = read_fds.fd_array[i];
}

if (select(max_fd+1, &read_fds, NULL, &error_fds, NULL) > 0)
{ 
    for( int i = 0; i < read_fds.fd_count; ++i ) 
    {
        if( !FD_ISSET(read_fds.fd_array[i], &error_fds) )
            do_socket_operation( read_fds.fd_array[i] ); 
    }

    for( int i = 0; i < error_fds.fd_count; ++i ) 
    {
        do_socket_error( error_fds.fd_array[i] ); 
    }
} 
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • The [select manpage](http://linux.die.net/man/2/select) says: `nfds is the highest-numbered file descriptor in any of the three sets, plus 1. ` Use the *highest-numbered*, not the *count*! – MaPePeR Apr 22 '13 at 12:37
  • @RemyLebeau, if you set `error_fds.fd_count = read_fds.fd_count;` then wouldn't it be better if you use only 1 `for` loop in the `if(select...)` statement, and I guess there are some errors with checking `FD_ISSET` for `read_fds` and `error_fds`. (i.e. `FD_ISSET(read_fds)` is not checked at all) – RIscRIpt Dec 01 '14 at 16:27
  • Checking `FD_ISSET(read_fds.fd_array[i], &read_fds)` would be redundant in this example because the loop is already iterating through the items of `read_fds` that were "set". The loop is checking if each "readible" item is not also in `error_fds` before processing it. To use a single loop for both `fd_set` at the same time would require different loop logic: `for (int fd = 0; fd <= max_fd; ++fd) { if (FD_ISSET(fd, &error_fds) {...} else if (FD_ISSET(fd, &read_fds)) {...} }`, but that is not portable to all platforms (Windows in particular). – Remy Lebeau Dec 01 '14 at 17:05
  • Of course, looping through `fd_count` directly isn't really portable, either. `fd_set` is meant to be opaque, you really should not be accessing its elements directly. The `FD_...()` macros are meant to hide away those details. – Remy Lebeau Dec 01 '14 at 17:48
  • Instead of using `for (int fd = 0; fd <= max_fd; ++fd)`, I would probably loop through `connected_sockets`, passing each one to `FD_ISSET()`. – Remy Lebeau Dec 01 '14 at 17:50
7

Select sets the bit corresponding to the file descriptor in the set, so, you need-not iterate through all the fds if you are interested in only a few (and can ignore others) just test only those file-descriptors for which you are interested.

if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
   perror("select");
   exit(4);
}

if(FD_ISSET(fd0, &read_fds))
{
   //do things
}

if(FD_ISSET(fd1, &read_fds))
{
   //do more things
}

EDIT
Here is the fd_set struct:

typedef struct fd_set {
        u_int   fd_count;               /* how many are SET? */
        SOCKET  fd_array[FD_SETSIZE];   /* an array of SOCKETs */
} fd_set;

Where, fd_count is the number of sockets set (so, you can add an optimization using this) and fd_array is a bit-vector (of the size FD_SETSIZE * sizeof(int) which is machine dependent). In my machine, it is 64 * 64 = 4096.

So, your question is essentially: what is the most efficient way to find the bit positions of 1s in a bit-vector (of size around 4096 bits)?

I want to clear one thing here:
"looping through all the connected sockets" doesn't mean that you are actually reading/doing stuff to a connection. FD_ISSET() only checks weather the bit in the fd_set positioned at the connection's assigned file_descriptor number is set or not. If efficiency is your aim, then isn't this the most efficient? using heuristics?

Please tell us what's wrong with this method, and what are you trying to achieve using the alternate method.

lalli
  • 6,083
  • 7
  • 42
  • 55
  • Thanks to you too. But please see my comment, perhaps I was not explaining explicitly enough this is the approach I do not wish to take. – Andreas Sep 07 '10 at 18:31
  • 3
    If this is not the answer that [is correct/you want], why has it been marked as the answer? – Greg Domjan Sep 07 '10 at 20:11
  • For two reasons. a) the edit provided the information I was looking for b) I changed my mind and therefore the answer became relevant. – Andreas Sep 14 '10 at 14:34
  • 5
    The definition of fd_set itself is dependent on the operating system. Linux's fd_set does not have an fd_count member. – Conspicuous Compiler Jul 31 '11 at 19:14
  • If performance > portability, the x86_64 instruction set has some instructions for REALLY REALLY fast scanning for bits in a machine word. So just construct a simple assembly function to do the scanning. – Mark K Cowan Aug 19 '14 at 08:51
  • `FD_SETSIZE * sizeof(int)` doesn't make any sense. As far as I know, `FD_SETSIZE` is the number of file descriptors in the bit map and there's no reason to multiply it by magic constants. Also the pseudo-C defining the bit array IMO only creates more confusion. – Pavel Šimerda Feb 09 '16 at 08:52
4

It's fairly straight-forward:

for( int fd = 0; fd < max_fd; fd++ )
    if ( FD_ISSET(fd, &my_fd_set) )
        do_socket_operation( fd );
Rakis
  • 7,779
  • 24
  • 25
4

This looping is a limitation of the select() interface. The underlying implementations of fd_set are usually a bit set, which obviously means that looking for a socket requires scanning over the bits.

It is for precisely this reason that several alternative interfaces have been created - unfortunately, they are all OS-specific. For example, Linux provides epoll, which returns a list of only the file descriptors that are active. FreeBSD and Mac OS X both provide kqueue, which accomplishes the same result.

caf
  • 233,326
  • 40
  • 323
  • 462
1

See this section 7.2 of Beej's guide to networking - '7.2. select()—Synchronous I/O Multiplexing' by using FD_ISSET.

in short, you must iterate through an fd_set in order to determine whether the file descriptor is ready for reading/writing...

t0mm13b
  • 34,087
  • 8
  • 78
  • 110
  • Thanks for the answer. I know this is the standard approach, however I'm looking to sway from it, please see my comment on my own post. – Andreas Sep 07 '10 at 18:28
0

I don't think you could do much using the select() call efficiently. The information at "The C10K problem" are still valid.

You will need some platform specific solutions:

Or you could use an event library to hide the platform detail for you libev

qunying
  • 428
  • 3
  • 4
0

I don't think what you are trying to do is a good idea.

Firstly its system dependent, but I believe you already know it.

Secondly, at the internal level these sets are stored as an array of integers and fds are stored as set bits. Now according to the man pages of select the FD_SETSIZE is 1024. Even if you wanted to iterate over and get your interested fd's you have to loop over that number along with the mess of bit manipulation. So unless you are waiting for more than FD_SETSIZE fd's on select which I don't think so is possible, its not a good idea.

Oh wait!!. In any case its not a good idea.

aeh
  • 779
  • 5
  • 8
0

ffs() may be used on POSIX or 4.3BSD for bits iteration, though it expects int (long and long long versions are glibc extensions). Of course, you have to check, if ffs() optimized as good as e.g. strlen and strchr.