1

Hi I have a script that kept pooling port state of my device, this is the simplified version.

When connection is success ( device is exist ) then I close the connection the state of connection become TIME_WAIT. On time this connection is pilling and reach the maximum connection allowed by os ( If I'm remember )

Any idea which part should I fix, I use port 53 for example but on real app I check for multiple port like ssh, vnc, etc.

I run the script on ubuntu 18.04 with python 3.5.6

import asyncio
import ipaddress
import sys

async def check_port(ip, port, timeout=1):
    conn = None
    response = False
    writer = None

    try:
        conn = asyncio.open_connection(ip, port)
        reader, writer = await asyncio.wait_for(conn, timeout=timeout)
        response = True
    except asyncio.CancelledError:
        print("asyncio cancel")
    except:
        response = False
    finally:
        if writer is not None:
            writer.close()
        if conn is not None:
            conn.close()
        print("Closing connection {}:{}".format(ip, port))

    print("{}:{} {}".format(ip, port, response))

async def poll_status():
    ips = [str(ip) for ip in ipaddress.IPv4Network("192.168.1.0/24")]
    while True:
        try:
            tasks = [check_port(ip, 53) for ip in ips]
            await asyncio.wait(tasks)
        except asyncio.CancelledError:
            break
        except KeyboardInterrupt:
            break
        except:
            pass
        await asyncio.sleep(1)

async def shutdown(task):
    task.cancel()
    await task
    await asyncio.sleep(1)

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    task = asyncio.ensure_future(poll_status())
    try:
        loop.run_forever()
    except:
        pass
    finally:
        loop.run_until_complete(asyncio.wait([shutdown(task)]))
        loop.close()

Connection kept pilling up like this ( output from "netstat -nput | grep TIME_WAIT" ) 192.168.1.1 is my router, so it success in check port but leave a lot of unclosed connection. It took a long time for the connection to be remove

tcp        0      0 192.168.1.4:42102       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:42582       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:46560       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:39428       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:45806       192.168.1.1:53          TIME_WAIT   -                                     
tcp        0      0 192.168.1.4:44752       192.168.1.1:53          TIME_WAIT   -                                      
tcp        0      0 192.168.1.4:40726       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:49864       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:38812       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:48464       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:41372       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:43408       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:47360       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:45478       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:41904       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:40160       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:46196       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:48744       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:49554       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:47774       192.168.1.1:53          TIME_WAIT   -                   
tcp        0      0 192.168.1.4:39370       192.168.1.1:53          TIME_WAIT   -                                    
tcp        0      0 192.168.1.4:43994       192.168.1.1:53          TIME_WAIT   - 
Aleph Aleph
  • 5,215
  • 2
  • 13
  • 28
wpsd
  • 303
  • 3
  • 21

1 Answers1

0

I'm not expert in network stuff, not sure if this answer will help, but here's my two cents.

First thing about netstat output. It's related router and seems to be unrelated to you OS limits. Fast googling shows following:

TIME_WAIT indicates that local endpoint (this side) has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately. The connections will be removed when they time out within four minutes.

Seems to be your code closed connection, i.e. doing everything right.

However I have no idea how router will handle on increasing number of such connections.


Let's now consider your code.


What you doing at line asyncio.wait(tasks) is running all your checks parallely. Depending on number of ips it can be too much. High chances you can benefit from using asyncio.Semaphore to limit maximum number of parallel checks. It'll look following way:

sem = asyncio.Semaphore(100)

async def check_port(ip, port, timeout=1):
    async with sem:
        # main code here

You can also read this answer to see real example of using semaphore.


Next things you may need to fix is how you handle CancelledError:

except asyncio.CancelledError:
    print("asyncio cancel")

This is not how task should react on this exception. It never should suppress it, only outer code can do it. Read this answer to see how to do it.


except:
    response = False

You never do such thing. Please read this topic for details.

You should instead at least do something like following:

except Exception as exc:    # catch Exception or it's subclasses only
    logging.exception(exc)  # log for purpose not to miss exception you can fix
    response = False
Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159
  • Regarding sempahore, I use that in my actual code. It's like a pool it help managing all await request either a coroutine function or not ( run with executor ). It is very good. two thumbs up :D – wpsd Mar 08 '19 at 23:17
  • For asyncio.CancelledError I mainly use for debug, sometime it's missing , so I put on every coroutine function to test :D, my bad. I have logging everywhere in the real app for the exception, just remove it in this example for simpler purpose – wpsd Mar 08 '19 at 23:20
  • I read about TIME_WAIT, it actually will reach limit when reach 65k ( max source port is used up ). On new open connection it will use different source port then the on in TIME_WAIT state which will pilling up in time. If using SO_REUSEADDR I think it doesn't fit my scenario, because I have to check the same device ( same destination ip and port multiple time ) – wpsd Mar 08 '19 at 23:23
  • In the actual app I check multiple port and other stuff for single device, each check will open a connection. I'm using agent-less approach. so reaching that 65K is possible. The solution I think of is to delay the next iteration to minute ( The TIME_WAIT is remove around 1 minute ), but this will beat the purpose ... – wpsd Mar 08 '19 at 23:28