52

I'm writing a point-to-point message queue system, and it has to be able to operate over UDP. I could arbitrarily pick one side or the other to be the "server" but it doesn't seem quite right since both ends are sending and receiving the same type of data from the other.

Is it possible to bind() and connect() both ends so that they send/receive only from each other? That seems like a nicely symmetric way to do it.

gct
  • 14,100
  • 15
  • 68
  • 107
  • 2
    Seems a bit strange, but I don't see why not. `connect()` just sets the default destination address/port for the socket. (Have you tried it? If it doesn't work for some reason, just use `sendto()`.) Personally I'd just use `sendto()` because otherwise you'll get confused if multiple clients connect to your server. – mpontillo Mar 16 '12 at 17:37

10 Answers10

62

Hello from the distant future which is the year 2018, to the year 2012.

There's, in fact, a reason behind connect()ing an UDP socket in practice (though blessed POSIX and its implementations don't in theory require you to).

An ordinary UDP socket doesn't know anything about its future destinations, so it performs a route lookup each time sendmsg() is called.

However, if connect() is called beforehand with a particular remote receiver's IP and port, the operating system kernel will be able to write down the reference to the route and assign it to the socket, making it significantly faster to send a message if subsequent sendmsg() calls do not specify a receiver (otherwise the previous setting would be ignored), choosing the default one instead.

Look at the lines 1070 through 1171:

if (connected)
    rt = (struct rtable *)sk_dst_check(sk, 0);

if (!rt) {
    [..skip..]

    rt = ip_route_output_flow(net, fl4, sk);

    [..skip..]
}

Until Linux kernel 4.18, this feature had been mostly limited to the IPv4 address family only. However, since 4.18-rc4 (and hopefully Linux kernel release 4.18 as well), it's fully functional with IPv6 sockets as well.

It may be a source of a serious performance benefit, though it will heavily depend on the OS you're using. At least, if you're using Linux and don't use the socket for multiple remote handlers, you should give it a try.

ximaera
  • 2,348
  • 17
  • 18
  • 1
    Could you please explain what a route lookup is? Is it just CPU cycles to determine which ethernet device the datagram has to be used? Or is going to cause some kinda of I/O which could take significant time? – Secto Kia Feb 19 '19 at 11:22
  • 2
    @SectoKia no, that's just a hashtable lookup. The hashtable stays in RAM, so it's just CPU cycles and RAM lookups. – ximaera Feb 19 '19 at 12:40
  • 2
    I'm back looking at a similar problem, so hello from the distant future of 2019! – gct May 14 '19 at 21:39
28

UDP is connectionless, so there's little sense for the OS in actually making some sort of connection.

In BSD sockets one can do a connect on a UDP socket, but this basically just sets the default destination address for send (instead giving explicitly to send_to).

Bind on a UDP socket tells the OS for which incoming, local interface address to actually accept packets (all packets to other addresses are dropped), regardless the kind of socket.

Upon receiving you must use recvfrom to identify which source the packet comes from. Note that if you want some sort of authentication, then using just the addresses involved is as insecure as no lock at all. TCP connections can be hijacked and naked UDP literally has IP spoofing written all over its head. You must add some sort of HMAC

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • 13
    Well connect() on a SOCK_DGRAM socket sets the default/send receive address, so then you can just use send and recv. I'm writing it to work over TCP as well so that ends up making some other code common to both protocols. – gct Mar 16 '12 at 18:18
  • @gct: Indeed. I was't so sure about this first, had to look up the connect manpage of BSD sockets first (had only the Linux one here and I consider this not authorative for all OSs). – datenwolf Mar 16 '12 at 18:59
  • @datenwolf are you saying Linux doesn't implement BSD sockets? – nhed Jun 01 '13 at 14:36
  • 2
    @nhed: No, what I mean is, that the Linux does support a *super* set of BSD socket with several extensions. And I had to look up a vanially reference page to make sure, I'm not writing about Linux-specific extensions. – datenwolf Jun 01 '13 at 14:49
  • @datenwolf ok, I just remember that a pretty old (Stevens) book mentioning connect(), so I assume that is implemented all over – nhed Jun 01 '13 at 15:06
  • @datenwolf: Don't you mean (all packets *from* other addresses are dropped) ? – user1511417 Dec 05 '16 at 22:23
  • 1
    @user1511417: No. You can still use `recvfrom` on a UDP socket on which `bind()` was called on, and it will still accept packets *from* arbitrary addresses. – datenwolf Dec 06 '16 at 10:45
  • "Bind on a UDP socket tells the OS for which incoming address to actually accept packets (all packets to other addresses are dropped)"; shouldn't this be "Bind on a UDP socket tells the OS **from** which incoming address to actually accept packets (all packets **from** other addresses are dropped)"? – Daniel Griscom Nov 27 '21 at 12:54
  • @DanielGriscom: you're thinking of the `connect` system call. `bind` assigns a socket to a *local* address. This is relevant for example if you're running multiple daemons for the same service (for example DNS), serving different addresses. For example the manual djbdns explicitly describes how to run multiple tinydns instances on 127.x.y.z:53 proxied behind a single publicly reachable dnscache. This distrinction of destination is set using `bind`! – datenwolf Nov 27 '21 at 13:35
  • 1
    @datenwolf Ah: I read "incoming address" as the address of the remote machine. Perhaps "local address" would be clearer? – Daniel Griscom Nov 27 '21 at 14:20
  • @DanielGriscom I tried to clarify it (trying to keep the term *incoming* as it is used in the documentation of some implementations of the API, as well as in documentation of some related libraries). – datenwolf Nov 27 '21 at 15:11
17

Here's a program that demonstrates how to bind() and connect() on the same UDP socket to a specific set of source and destination ports respectively. The program can be compiled on any Linux machine and has the following usage:

usage: ./<program_name> dst-hostname dst-udpport src-udpport

I tested this code opening two terminals. You should be able to send a message to the destination node and receive messages from it.

In terminal 1 run

./<program_name> 127.0.0.1 5555 5556

In terminal 2 run

./<program_name> 127.0.0.1 5556 5555

Even though I've tested it on a single machine I think it should also work on two different machines once you've setup the correct firewall settings

Here's a description of the flow:

  1. Setup hints indicated the type of destination address as that of a UDP connection
  2. Use getaddrinfo to obtain the address info structure dstinfo based on argument 1 which is the destination address and argument 2 which is the destination port
  3. Create a socket with the first valid entry in dstinfo
  4. Use getaddrinfo to obtain the address info structure srcinfo primarily for the source port details
  5. Use srcinfo to bind to the socket obtained
  6. Now connect to the first valid entry of dstinfo
  7. If all is well enter the loop
  8. The loop uses a select to block on a read descriptor list which consists of the STDIN and sockfd socket created
  9. If STDIN has an input it is sent to the destination UDP connection using sendall function
  10. If EOM is received the loop is exited.
  11. If sockfd has some data it is read through recv
  12. If recv returns -1 it is an error we try to decode it with perror
  13. If recv returns 0 it means the remote node has closed the connection. But I believe has no consequence with UDP a which is connectionless.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>

#define STDIN 0

int sendall(int s, char *buf, int *len)
{
    int total = 0;        // how many bytes we've sent
    int bytesleft = *len; // how many we have left to send
    int n;

    while(total < *len) {
        n = send(s, buf+total, bytesleft, 0);
        fprintf(stdout,"Sendall: %s\n",buf+total);
        if (n == -1) { break; }
        total += n;
        bytesleft -= n;
    }

    *len = total; // return number actually sent here

    return n==-1?-1:0; // return -1 on failure, 0 on success
} 

int main(int argc, char *argv[])
{
   int sockfd;
   struct addrinfo hints, *dstinfo = NULL, *srcinfo = NULL, *p = NULL;
   int rv = -1, ret = -1, len = -1,  numbytes = 0;
   struct timeval tv;
   char buffer[256] = {0};
   fd_set readfds;

   // don't care about writefds and exceptfds:
   //     select(STDIN+1, &readfds, NULL, NULL, &tv);

   if (argc != 4) {
      fprintf(stderr,"usage: %s dst-hostname dst-udpport src-udpport\n");
      ret = -1;
      goto LBL_RET;
   }


   memset(&hints, 0, sizeof hints);
   hints.ai_family = AF_UNSPEC;
   hints.ai_socktype = SOCK_DGRAM;        //UDP communication

   /*For destination address*/
   if ((rv = getaddrinfo(argv[1], argv[2], &hints, &dstinfo)) != 0) {
      fprintf(stderr, "getaddrinfo for dest address: %s\n", gai_strerror(rv));
      ret = 1;
      goto LBL_RET;
   }

   // loop through all the results and make a socket
   for(p = dstinfo; p != NULL; p = p->ai_next) {

      if ((sockfd = socket(p->ai_family, p->ai_socktype,
                  p->ai_protocol)) == -1) {
         perror("socket");
         continue;
      }
      /*Taking first entry from getaddrinfo*/
      break;
   }

   /*Failed to get socket to all entries*/
   if (p == NULL) {
      fprintf(stderr, "%s: Failed to get socket\n");
      ret = 2;
      goto LBL_RET;
   }

   /*For source address*/
   memset(&hints, 0, sizeof hints);
   hints.ai_family = AF_UNSPEC;
   hints.ai_socktype = SOCK_DGRAM;        //UDP communication
   hints.ai_flags = AI_PASSIVE;     // fill in my IP for me
   /*For source address*/
   if ((rv = getaddrinfo(NULL, argv[3], &hints, &srcinfo)) != 0) {
      fprintf(stderr, "getaddrinfo for src address: %s\n", gai_strerror(rv));
      ret = 3;
      goto LBL_RET;
   }

   /*Bind this datagram socket to source address info */
   if((rv = bind(sockfd, srcinfo->ai_addr, srcinfo->ai_addrlen)) != 0) {
      fprintf(stderr, "bind: %s\n", gai_strerror(rv));
      ret = 3;
      goto LBL_RET;
   }

   /*Connect this datagram socket to destination address info */
   if((rv= connect(sockfd, p->ai_addr, p->ai_addrlen)) != 0) {
      fprintf(stderr, "connect: %s\n", gai_strerror(rv));
      ret = 3;
      goto LBL_RET;
   }

   while(1){
      FD_ZERO(&readfds);
      FD_SET(STDIN, &readfds);
      FD_SET(sockfd, &readfds);

      /*Select timeout at 10s*/
      tv.tv_sec = 10;
      tv.tv_usec = 0;
      select(sockfd + 1, &readfds, NULL, NULL, &tv);

      /*Obey your user, take his inputs*/
      if (FD_ISSET(STDIN, &readfds))
      {
         memset(buffer, 0, sizeof(buffer));
         len = 0;
         printf("A key was pressed!\n");
         if(0 >= (len = read(STDIN, buffer, sizeof(buffer))))
         {
            perror("read STDIN");
            ret = 4;
            goto LBL_RET;
         }

         fprintf(stdout, ">>%s\n", buffer);

         /*EOM\n implies user wants to exit*/
         if(!strcmp(buffer,"EOM\n")){
            printf("Received EOM closing\n");
            break;
         }

         /*Sendall will use send to transfer to bound sockfd*/
         if (sendall(sockfd, buffer, &len) == -1) {
            perror("sendall");
            fprintf(stderr,"%s: We only sent %d bytes because of the error!\n", argv[0], len);
            ret = 5;
            goto LBL_RET;
         }  
      }

      /*We've got something on our socket to read */
      if(FD_ISSET(sockfd, &readfds))
      {
         memset(buffer, 0, sizeof(buffer));
         printf("Received something!\n");
         /*recv will use receive to connected sockfd */
         numbytes = recv(sockfd, buffer, sizeof(buffer), 0);
         if(0 == numbytes){
            printf("Destination closed\n");
            break;
         }else if(-1 == numbytes){
            /*Could be an ICMP error from remote end*/
            perror("recv");
            printf("Receive error check your firewall settings\n");
            ret = 5;
            goto LBL_RET;
         }
         fprintf(stdout, "<<Number of bytes %d Message: %s\n", numbytes, buffer);
      }

      /*Heartbeat*/
      printf(".\n");
   }

   ret = 0;
LBL_RET:

   if(dstinfo)
      freeaddrinfo(dstinfo);

   if(srcinfo)
      freeaddrinfo(srcinfo);

   close(sockfd);

   return ret;
}
Trevor Hickey
  • 36,288
  • 32
  • 162
  • 271
Conrad Gomes
  • 416
  • 3
  • 7
6

Really the key is connect():

If the socket sockfd is of type SOCK_DGRAM then addr is the address to which datagrams are sent by default, and the only address from which datagrams are received.

Geoff Reedy
  • 34,891
  • 3
  • 56
  • 79
  • 4
    In my understanding the 'server' has to bind() regardless, in order to actually attach to a port so that the client can have a real place to send and receive data... – gct Mar 16 '12 at 18:18
  • 1
    @gct: You can send UDP packets without binding to a source port, but you can't receive then. – datenwolf Mar 16 '12 at 18:57
  • @datenwolf: sure, but both ends of my message queue need to send and receive. I could arbitrarily choose which is client and which is server, but then they both run the same code after that, so it'd be cool if I could make them completely symmetric. – gct Mar 16 '12 at 19:41
  • @gct: Just connect them on both sides. – datenwolf Mar 16 '12 at 20:32
  • 3
    @gct: Well, you can bind on both sides. – datenwolf Mar 17 '12 at 01:05
  • 1
    `connect`ed sockets are bound anyway. from connect man page: `If the socket has not already been bound to a local address, connect() shall bind it to an address which, unless the socket's address family is AF_UNIX, is an unused local address.` – daurnimator Sep 16 '15 at 06:53
1

There is a problem in your code:

memset(&hints, 0, sizeof hints);
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_DGRAM;        //UDP communication

/*For destination address*/
if ((rv = getaddrinfo(argv[1], argv[2], &hints, &dstinfo)) 

By using AF_UNSPEC and SOCK_DGRAM only, you gets a list of all the possible addrs. So, when you call socket, the address you are using might not be your expected UDP one. You should use

hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_DGRAM;
hints.ai_protocol = IPPROTO_UDP;
hints.ai_flags = AI_PASSIVE;

instead to make sure the addrinfo you are retrieving is what you wanted.

In another word, the socket you created may not be an UDP socket, and that is the reason why it does not work.

Jack
  • 11
  • 2
0

This page contains some great info about connected versus unconnected sockets: http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch08lev1sec11.html

This quote answers your question:

Normally, it is a UDP client that calls connect, but there are applications in which the UDP server communicates with a single client for a long duration (e.g., TFTP); in this case, both the client and server can call connect.

konrad
  • 3,340
  • 1
  • 27
  • 26
-2

I'd look at it more from the idea of what UDP is providing. UDP is an 8 byte header which adds 2 byte send and receive ports (4 bytes total). These ports interact with Berkeley Sockets to provide your traditional socket interface. I.e. you can't bind to an address without a port or vice-versa.

Typically when you send a UDP packet the receive side port (source) is ephemeral and the send side port (destination) is your destination port on the remote computer. You can defeat this default behavior by binding first and then connecting. Now your source port and destination port would be the same so long as the same ports are free on both computers.

In general this behavior (let's call it port hijacking) is frowned upon. This is because you have just limited your send side to only being able to send from one process, as opposed to working within the ephemeral model which dynamically allocates send side source ports.

Incidentally, the other four bytes of an eight byte UDP payload, length and CRC are pretty much totally useless as they are already provided in the IP packet and a UDP header is fixed length. Like come on people, computers are pretty good at doing a little subtraction.

Clarus
  • 2,259
  • 16
  • 27
-2

I have not used connect() under UDP. I feel connect() was designed for two totally different purposes under UDP vs TCP.

The man page has some brief details on the usage of connect() under UDP:

Generally, connection-based protocol (like TCP) sockets may connect() successfully only once; connectionless protocol (like UDP) sockets may use connect() multiple times to change their association.

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
-2

YES, you can. I do it too.

And your use case is the one where this is useful: both side act as both client & server, and there is only one process on both side.

Droopycom
  • 1,831
  • 1
  • 17
  • 20
-4

If you are c/c++ lover, you may try route_io

It is simple to use, create a instance to accept different port routing to your function.

Example :

  void read_data(rio_request_t *req);
  void read_data(rio_request_t *req) {
  char *a = "CAUSE ERROR FREE INVALID";

  if (strncmp( (char*)req->in_buff->start, "ERROR", 5) == 0) {
    free(a);
  }
  // printf("%d,  %.*s\n", i++, (int) (req->in_buff->end - req->in_buff->start), req->in_buff->start);
  rio_write_output_buffer_l(req, req->in_buff->start, (req->in_buff->end - req->in_buff->start));
  // printf("%d,  %.*s\n", i++, (int) (req->out_buff->end - req->out_buff->start), req->out_buff->start);
}

int main(void) {

  rio_instance_t * instance = rio_create_routing_instance(24, NULL, NULL);
  rio_add_udp_fd(instance, 12345, read_data, 1024, NULL);
  rio_add_tcp_fd(instance, 3232, read_data, 64, NULL);

  rio_start(instance);

  return 0;
}
woon minika
  • 1
  • 1
  • 1