4

I have a network client which is stuck in recvfrom a server not under my control which, after 24+ hours, is probably never going to respond. The program has processed a great deal of data, so I don't want to kill it; I want it to abandon the current connection and proceed. (It will do so correctly if recvfrom returns EOF or -1.) I have already tried several different programs that purport to be able to disconnect stale TCP channels by forging RSTs (tcpkill, cutter, killcx); none had any effect, the program remained stuck in recvfrom. I have also tried taking the network interface down; again, no effect.

It seems to me that there really should be a way to force a disconnect at the socket-API level without forging network packets. I do not mind horrible hacks, up to and including poking kernel data structures by hand; this is a disaster-recovery situation. Any suggestions?

(For clarity, the TCP channel at issue here is in ESTABLISHED state according to lsof.)

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Does this help? http://stackoverflow.com/questions/6389970/unblock-recvfrom-when-socket-is-closed You'd somehow need to shutdown the socket. Maybe inject code into the running process, or duplicate the handle into a process you control. – usr Sep 25 '13 at 21:42
  • Try to power off/shutdown the network switch the computer is connected to. I seems to remember that this may reset/close the connection. – MTilsted Sep 25 '13 at 21:45
  • 1
    @MTilsted Highly unlikely - as long as no data is sent, the socket will survive for a long time, regardless of actual network connectivity. Plus, the question does say *I have also tried taking the network interface down*. – cnicutar Sep 25 '13 at 22:04
  • You want to force a FIN, not an RST, for the program to see an EOS. – user207421 Sep 25 '13 at 23:27

2 Answers2

5

I do not mind horrible hacks

That's all you have to say. I am guessing the tools you tried didn't work because they sniff traffic to get an acceptable ACK number to kill the connection. Without traffic flowing they have no way to get hold of it.

Here are things you can try:

Probe all the sequence numbers

Where those tools failed you can still do it. Make a simple python script and with scapy, for each sequence number send a RST segment with the correct 4-tuple (ports and addresses). There's at most 4 billion (actually fewer assuming a decent window - you can find out the window for free using ss -i).

Make a kernel module to get hold of the socket

  • Make a kernel module getting a list of TCP sockets: look for sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[i].chain)

  • Identify your victim sk

At this point you intimately have access to your socket. So

  • You can call tcp_reset or tcp_disconnect on it. You won't be able to call tcp_reset directly (since it doesn't have EXPORT_SYMBOL) but you should be able to mimic it: most of the functions it calls are exported

  • Or you can get the expected ACK number from tcp_sk(sk) and directly forge a RST packet with scapy


Here is function I use to print established sockets - I scrounged bits and pieces from the kernel to make it some time ago:

#include <net/inet_hashtables.h>
#define NIPQUAD(addr) \
    ((unsigned char *)&addr)[0], \
    ((unsigned char *)&addr)[1], \
    ((unsigned char *)&addr)[2], \
    ((unsigned char *)&addr)[3]

#define NIPQUAD_FMT "%u.%u.%u.%u"


extern struct inet_hashinfo tcp_hashinfo;

/* Decides whether a bucket has any sockets in it. */
static inline bool empty_bucket(int i)
{
    return hlist_nulls_empty(&tcp_hashinfo.ehash[i].chain);
}

void print_tcp_socks(void)
{
    int i = 0;
    struct inet_sock *inet;

    /* Walk hash array and lock each if not empty. */
    printk("Established ---\n");
    for (i = 0; i <= tcp_hashinfo.ehash_mask; i++) {
        struct sock *sk;
        struct hlist_nulls_node *node;
        spinlock_t *lock = inet_ehash_lockp(&tcp_hashinfo, i);

        /* Lockless fast path for the common case of empty buckets */
        if (empty_bucket(i))
            continue;

        spin_lock_bh(lock);
        sk_nulls_for_each(sk, node, &tcp_hashinfo.ehash[i].chain) {
            if (sk->sk_family != PF_INET)
                continue;

            inet = inet_sk(sk);

            printk(NIPQUAD_FMT":%hu ---> " NIPQUAD_FMT    
            ":%hu\n", NIPQUAD(inet->inet_saddr),                
            ntohs(inet->inet_sport), NIPQUAD(inet->inet_daddr), 
            ntohs(inet->inet_dport));
        }
        spin_unlock_bh(lock);
    }
}

You should be able to pop this into a simple "Hello World" module and after insmoding it, in dmesg you will see sockets (much like ss or netstat).

cnicutar
  • 178,505
  • 25
  • 365
  • 392
1

I understand that what you want to do it's to automatize the process to make a test. But if you just want to check the correct handling of the recvfrom error, you could attach with the GDB and close the fd with close() call.

Here you could see an example.

Another option is to use scapy for crafting propper RST packets (which is not in your list). This is the way I tested the connections RST in a bridged system (IMHO is the best option), you could also implement a graceful shutdown.

Here an example of the scapy script.

  • Closing the descriptor from `GDB` is a good idea, but what happens with the ongoing `recvfrom` system call ? Will it unblock and return `0` or `-1` ? Is it safe to pull a socket from under an ongoing system call ? – cnicutar Sep 26 '13 at 08:59
  • I suspect that it will return with EBADFS, but I'm not completely sure. I've checked it with event loops and non blocking sockets, and the epoll fires an event. The best option is to test it. – Jon Ander Ortiz Durántez Sep 26 '13 at 09:08