Overview
I have a simple question with code below. Hopefully I didn't make a mistake in the code.
I'm a network engineer, and I need to test certain linux behavior of our business application keepalives during network outages (I'm going to insert some iptables stuff later to jack with the connection - first I want to make sure I got the client & server right).
As part of a network failure test I'm conducting, I wrote a non-blocking Python TCP client and server that are supposed to blindly send messages to each other in a loop. To understand what's happening I am using loop counters.
The server's loop should be relatively straightforward. I loop through every fd that select
says is ready. I never even import sleep
anywhere in my server's code. From this perspective, I don't expect the server's code to pause while it loops over the client's socket , but for some reason the server code pauses intermittently (more detail, below).
I initially didn't put a sleep in the client's loop. Without a sleep on the client side, the server and client seem to be as efficient as I want. However, when I put a time.sleep(1)
statement after the client does an fd.send()
to the server, the TCP server code intermittently pauses while the client is sleeping.
My questions:
Should I be able to write a single-threaded Python TCP server that doesn't pause when the client hits<- ANSWEREDtime.sleep()
in the client'sfd.send()
loop? If so, what am I doing wrong?- If I wrote this test code correctly and the server shouldn't pause, why is the TCP server
intermittentlypausing while it polls the client's connection for data?
Reproducing the scenario
I'm running this on two RHEL6 linux machines. To reproduce the issue...
- Open two different terminals.
- Save the client and server scripts in different files
- Change the shebang path to your local python (I'm using Python 2.7.15)
- Change the
SERVER_HOSTNAME
andSERVER_DOMAIN
in the client's code to be the hostname and domain of the server you're running this on - Start the server first, then start the client.
After the client connects, you'll see messages as shown in EXHIBIT 1 scrolling quickly in the server's terminal. After a few seconds The scrolling pauses intermittently when the client hits time.sleep()
. I don't expect to see those pauses, but maybe I've misunderstood something.
EXHIBIT 1
---
LOOP_COUNT 0
---
LOOP_COUNT 1
---
LOOP_COUNT 2
---
LOOP_COUNT 3
CLIENTMSG: 'client->server 0'
---
LOOP_COUNT 4
---
LOOP_COUNT 5
---
LOOP_COUNT 6
---
LOOP_COUNT 7
---
LOOP_COUNT 8
---
LOOP_COUNT 9
---
LOOP_COUNT 10
---
LOOP_COUNT 11
---
Summary resolution
If I wrote this test code correctly and the server shouldn't pause, why is the TCP server intermittently pausing while it polls the client's connection for data?
Answering my own question. My blocking problem was caused by calling select() with a non-zero timeout.
When I changed select() to use a zero-second timeout, I got expected results.
Final non-blocking code (incorporating suggestions in answers):
tcp_server.py
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
from socket import MSG_DONTWAIT
#from socket import MSG_OOB <--- for send()
from socket import socket
import socket as socket_module
import select
import errno
import fcntl
import time
import sys
import os
def get_errno_info(e, op='', debugmsg=False):
"""Return verbose information from errno errors, such as errors returned by python socket()"""
VALID_OP = set(['accept', 'connect', 'send', 'recv', 'read', 'write'])
assert op.lower() in VALID_OP, "op must be: {0}".format(
','.join(sorted(VALID_OP)))
## ref: man 3 errno (in linux)... other systems may be man 2 intro
## also see https://docs.python.org/2/library/errno.html
try:
retval_int = int(e.args[0]) # Example: 32
retval_str = os.strerror(e.args[0]) # Example: 'Broken pipe'
retval_code = errno.errorcode.get(retval_int, 'MODULEFAIL') # Ex: EPIPE
except:
## I don't expect to get here unless something broke in python errno...
retval_int = -1
retval_str = '__somethingswrong__'
retval_code = 'BADFAIL'
if debugmsg:
print "DEBUG: Can't {0}() on socket (errno:{1}, code:{2} / {3})".format(
op, retval_int, retval_code, retval_str)
return retval_int, retval_str, retval_code
host = ''
port = 6667 # IRC service
DEBUG = True
serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)
#fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK) # Make the socket non-blocking
serv_sock.setblocking(False)
sock_list = [serv_sock]
from_client_str = '__DEFAULT__'
to_client_idx = 0
loop_count = 0
need_send_select = False
while True:
if need_send_select:
# Only do this after send() EAGAIN or EWOULDBLOCK...
send_sock_list = sock_list
else:
send_sock_list = []
#print "---"
#print "LOOP_COUNT", loop_count
recv_ready_list, send_ready_list, exception_ready = select.select(
sock_list, send_sock_list, [], 0.0) # Last float is the select() timeout...
## Read all sockets which are output-ready... might be client or server...
for sock_fd in recv_ready_list:
# accept() if we're reading on the server socket...
if sock_fd is serv_sock:
try:
clientsock, clientaddr = sock_fd.accept()
except socket_module.error, e:
errstr, errint, errcode = get_errno_info(e, op='accept',
debugmsg=DEBUG)
assert sock_fd.gettimeout()==0.0, "client socket should be in non-blocking mode"
sock_list.append(clientsock)
# read input from the client socket...
else:
try:
from_client_str = sock_fd.recv(1024, MSG_DONTWAIT)
if from_client_str=='':
# Client closed the socket...
print "CLIENT CLOSED SOCKET"
sock_list.remove(sock_fd)
except socket_module.error, e:
errstr, errint, errcode = get_errno_info(e, op='recv',
debugmsg=DEBUG)
if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
# socket unavailable to read()
continue
elif errcode=='ECONNRESET' or errcode=='EPIPE':
# Client closed the socket...
sock_list.remove(sock_fd)
else:
print "UNHANDLED SOCKET ERROR", errcode, errint, errstr
sys.exit(1)
print "from_client_str: '{0}'".format(from_client_str)
## Adding dynamic_list, per input from EJP, below...
if need_send_select is False:
dynamic_list = sock_list
else:
dynamic_list = send_ready_list
## NOTE: socket code shouldn't walk this list unless a write is pending...
## broadast the same message to all clients...
for sock_fd in dynamic_list:
## Ignore server's listening socket...
if sock_fd is serv_sock:
## Only send() to accept()ed sockets...
continue
try:
to_client_str = "server->client: {0}\n".format(to_client_idx)
send_retval = sock_fd.send(to_client_str, MSG_DONTWAIT)
## send() returns the number of bytes written, on success
## disabling assert check on sent bytes while using MSG_DONTWAIT
#assert send_retval==len(to_client_str)
to_client_idx += 1
need_send_select = False
except socket_module.error, e:
errstr, errint, errcode = get_errno_info(e, op='send',
debugmsg=DEBUG)
if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
need_send_select = True
continue
elif errcode=='ECONNRESET' or errcode=='EPIPE':
# Client closed the socket...
sock_list.remove(sock_fd)
else:
print "FATAL UNHANDLED SOCKET ERROR", errcode, errint, errstr
sys.exit(1)
loop_count += 1
tcp_client.py
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM
from socket import MSG_DONTWAIT # non-blocking send/recv; see man 2 recv
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os
## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myServerHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 6667
DEBUG = True
def get_errno_info(e, op='', debugmsg=False):
"""Return verbose information from errno errors, such as errors returned by python socket()"""
VALID_OP = set(['accept', 'connect', 'send', 'recv', 'read', 'write'])
assert op.lower() in VALID_OP, "op must be: {0}".format(
','.join(sorted(VALID_OP)))
## ref: man 3 errno (in linux)... other systems may be man 2 intro
## also see https://docs.python.org/2/library/errno.html
try:
retval_int = int(e.args[0]) # Example: 32
retval_str = os.strerror(e.args[0]) # Example: 'Broken pipe'
retval_code = errno.errorcode.get(retval_int, 'MODULEFAIL') # Ex: EPIPE
except:
## I don't expect to get here unless something broke in python errno...
retval_int = -1
retval_str = '__somethingswrong__'
retval_code = 'BADFAIL'
if debugmsg:
print "DEBUG: Can't {0}() on socket (errno:{1}, code:{2} / {3})".format(
op, retval_int, retval_code, retval_str)
return retval_int, retval_str, retval_code
connect_finished = False
while not connect_finished:
try:
c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
# Set socket non-blocking
#fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)
c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
c2s.setblocking(False)
assert c2s.gettimeout()==0.0, "c2s socket should be in non-blocking mode"
connect_finished = True
except socket_module.error, e:
errstr, errint, errcode = get_errno_info(e, op='connect',
debugmsg=DEBUG)
if errcode=='EINPROGRESS':
pass
to_srv_idx = 0
need_send_select = False
while True:
socket_list = [c2s]
# Get the list sockets which can: take input, output, etc...
if need_send_select:
# Only do this after send() EAGAIN or EWOULDBLOCK...
send_sock_list = socket_list
else:
send_sock_list = []
recv_ready_list, send_ready_list, exception_ready = select.select(
socket_list, send_sock_list, [])
for sock_fd in recv_ready_list:
assert sock_fd is c2s, "Strange socket failure here"
#incoming message from remote server
try:
from_srv_str = sock_fd.recv(1024, MSG_DONTWAIT)
except socket_module.error, e:
## https://stackoverflow.com/a/16745561/667301
errstr, errint, errcode = get_errno_info(e, op='recv',
debugmsg=DEBUG)
if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
# Busy, try again later...
print "recv() BLOCKED"
continue
elif errcode=='ECONNRESET' or errcode=='EPIPE':
# Server ended normally...
sys.exit(0)
## NOTE: if we get this far, we successfully received from_srv_str.
## Anything caught above, is some kind of fail...
print "from_srv_str: {0}".format(from_srv_str)
## Adding dynamic_list, per input from EJP, below...
if need_send_select is False:
dynamic_list = socket_list
else:
dynamic_list = send_ready_list
for sock_fd in dynamic_list:
# outgoing message to remote server
if sock_fd is c2s:
try:
to_srv_str = 'client->server {0}'.format(to_srv_idx)
sock_fd.send(to_srv_str, MSG_DONTWAIT)
##
time.sleep(1) ## Client blocks the server here... Why????
##
to_srv_idx += 1
need_send_select = False
except socket_module.error, e:
errstr, errint, errcode = get_errno_info(e, op='send',
debugmsg=DEBUG)
if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
## Try to send() later...
print "send() BLOCKED"
need_send_select = True
continue
elif errcode=='ECONNRESET' or errcode=='EPIPE':
# Server ended normally...
sys.exit(0)
Original Question Code:
tcp_server.py
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
#from socket import MSG_OOB <--- for send()
from socket import socket
import socket as socket_module
import select
import fcntl
import os
host = ''
port = 9997
serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)
fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK) # Make the socket non-blocking
sock_list = [serv_sock]
from_client_str = '__DEFAULT__'
to_client_idx = 0
loop_count = 0
while True:
recv_ready_list, send_ready_list, exception_ready = select.select(sock_list, sock_list,
[], 5)
print "---"
print "LOOP_COUNT", loop_count
## Read all sockets which are input-ready... might be client or server...
for sock_fd in recv_ready_list:
# accept() if we're reading on the server socket...
if sock_fd is serv_sock:
clientsock, clientaddr = sock_fd.accept()
sock_list.append(clientsock)
# read input from the client socket...
else:
try:
from_client_str = sock_fd.recv(4096)
if from_client_str=='':
# Client closed the socket...
print "CLIENT CLOSED SOCKET"
sock_list.remove(sock_fd)
except socket_module.error, e:
print "WARNING RECV FAIL"
print "from_client_str: '{0}'".format(from_client_str)
for sock_fd in send_ready_list:
if sock_fd is not serv_sock:
try:
to_client_str = "server->client: {0}\n".format(to_client_idx)
sock_fd.send(to_client_str)
to_client_idx += 1
except socket_module.error, e:
print "TO CLIENT SEND ERROR", e
loop_count += 1
tcp_client.py
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os
## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 9997
def handle_socket_error_continue(e):
## non-blocking socket info from:
## https://stackoverflow.com/a/16745561/667301
print "HANDLE_SOCKET_ERROR_CONTINUE"
err = e.args[0]
if (err==errno.EAGAIN) or (err==errno.EWOULDBLOCK):
print 'CLIENT DEBUG: No data input from server'
return True
else:
print 'FROM SERVER RECV ERROR: {0}'.format(e)
sys.exit(1)
c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
# Set socket non-blocking...
fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)
to_srv_idx = 0
while True:
socket_list = [c2s]
# Get the list sockets which can: take input, output, etc...
recv_ready_list, send_ready_list, exception_ready = select.select(
socket_list, socket_list, [])
for sock_fd in recv_ready_list:
assert sock_fd is c2s, "Strange socket failure here"
#incoming message from remote server
try:
from_srv_str = sock_fd.recv(4096)
except socket_module.error, e:
## https://stackoverflow.com/a/16745561/667301
err_continue = handle_socket_error_continue(e)
if err_continue is True:
continue
else:
if len(from_srv_str)==0:
print "SERVER CLOSED NORMALLY"
sys.exit(0)
## NOTE: if we get this far, we successfully received from_srv_str.
## Anything caught above, is some kind of fail...
print "from_srv_str: {0}".format(from_srv_str)
for sock_fd in send_ready_list:
#incoming message from remote server
if sock_fd is c2s:
#to_srv_str = raw_input('Send to server: ')
try:
to_srv_str = 'client->server {0}'.format(to_srv_idx)
sock_fd.send(to_srv_str)
##
time.sleep(1) ## Client blocks the server here... Why????
##
to_srv_idx += 1
except socket_module.error, e:
print "TO SERVER SEND ERROR", e