0

I am all new to asynchronous programing in Python. I am enhancing part of my code to work asynchronously where the rest of the code works synchronously. The part I am enhancing asynchronously is to upload n number of files to a sftp and S3 bucket. The code I have written is working well for smaller sets of records, say like 100 to 300, but when the record set grows to 1000 or more, I am getting the following error "OSError: [Errno 24] Too many open files" for both uploading functions. Can anyone help me in general how the async works here and identify the issue?

I am using asyncssh and aiobotocore libraries for this code.

class my_class():

    def event_loop(self):

        tuple_of_records = () ##my files are kept in this tuple
        loop = asyncio.get_event_loop()
        loop.run_until_complete(self.iterate_asynchronously(tuple_of_records))
        loop.close()

    async def iterate_asynchronously(self,tuple_of_records): ##processing files in batch
        batch_size = 100
        batches = [tuple_of_records[i:i+batch_size] for i in range(0, len(tuple_of_records), batch_size)]

        batch_tasks = [self.batch_process(batch) for batch in batches]
        await asyncio.gather(*batch_tasks)

    async def batch_process(self,batch):
        tasks = [self.my_method(files) for files in batch]
        await asyncio.gather(*tasks)


    async my_method(self):
        await asyncio.gather(self.sftp_uploader(source_file,destination_file),self.s3_uploader(source_file,destination_file))

    async def sftp_uploader(self,source_file,destination_file)
        async with asyncssh.connect(
                host = "hostname",
                username = "uname",
                password = "pwd",
                known_hosts=None,) as sftp_connection: 
                async with sftp_connection.start_sftp_client() as sftp_client: 
             
            await sftp_client.put(source_file,destination_file)

    async def s3_uploader(self, source_file, destination_file): 
        ses = session.get_session()                    
        async with ses.create_client('s3', aws_secret_access_key = 'key', aws_access_key_id = 'id') as s3_client:
            with open(source_file, 'rb') as file:
                await s3_client.put_object(Bucket = 'bucket_name', Key = destination_file, Body=file)

if __name__ == "__main__":
    o = my_class()
    o.event_loop()

Here is the error I am receiving:

Traceback (most recent call last):
  File "My_code.py", line 455, in sftp_uploader
    async with asyncssh.connect(
  File "/usr/local/lib/python3.8/dist-packages/asyncssh/misc.py", line 220, in __aenter__
    self._result = await self._coro
  File "/usr/local/lib/python3.8/dist-packages/asyncssh/connection.py", line 6854, in connect
    return await _connect(options, loop, flags, conn_factory,
  File "/usr/local/lib/python3.8/dist-packages/asyncssh/connection.py", line 297, in _connect
    _, conn = await loop.create_connection(conn_factory, host, port,
  File "/usr/lib/python3.8/asyncio/base_events.py", line 1025, in create_connection
    raise exceptions[0]
  File "/usr/lib/python3.8/asyncio/base_events.py", line 1010, in create_connection
    sock = await self._connect_sock(
  File "/usr/lib/python3.8/asyncio/base_events.py", line 907, in _connect_sock
    sock = socket.socket(family=family, type=type_, proto=proto)
  File "/usr/lib/python3.8/socket.py", line 231, in __init__
    _socket.socket.__init__(self, family, type, proto, fileno)
OSError: [Errno 24] Too many open files
nonDucor
  • 2,057
  • 12
  • 17
Tim
  • 11
  • 1
  • Can you post the relevant code? Please make a minimal reproducible example – Matteo Zanoni Aug 16 '23 at 13:59
  • Please [edit] your question to provide a [mre]. We can't help you if we can't see what you've tried! – JRiggles Aug 16 '23 at 13:59
  • Please provide the complete traceback. This likely has nothing to do with async. The OS has a limit to the number of open files and this limit is exceeded. What is the host OS? – Michael Ruth Aug 16 '23 at 14:00
  • Also provide the output of `ulimit -n`, `ulimit -u`, and `sysctl fs.file-max` - assuming Linux. – Michael Ruth Aug 16 '23 at 14:04
  • Please provide enough code so others can better understand or reproduce the problem. – Community Aug 16 '23 at 14:15
  • I have shared the relevant code and error received. @MichaelRuth i am using windows 11 machine and the ulimit -n is 10000 currently – Tim Aug 16 '23 at 14:46
  • How about `ulimit -u`? The process is limited to 10000 file handles, but the user may have a lower limit. – Michael Ruth Aug 16 '23 at 15:41
  • 1
    You create an ssh connection per file transfer. If you opened that connection once and used it multiple times, that would reduce the number of sockets used dramatically and you'd have less of an impact on the server (authentication is fairly resource intensive). – tdelaney Aug 16 '23 at 15:46
  • @nonDucor I can see no improvement from your edit. Indenting code by four spaces is equivalent to embedding it within triple backticks (provided the question's tag correctly identifies the language). – tripleee Aug 16 '23 at 15:50
  • @tripleee, I just fixed `Class` to `class` on the beginning of the code block, so it is at least less wrong (there are still indentation errors, but this I cannot fix as there is some code missing). – nonDucor Aug 16 '23 at 15:53
  • It could be that a bunch of sockets are in timed-wait (a period after closing the socket where the endpoint is held in reserve), taking kernel resources. Reducing the number of connections would help with that. – tdelaney Aug 16 '23 at 16:18
  • @tdelaney, I first tried as you said , created a connection object 'sftp_client' and passed that object as an argument in the function call to use the put() for each record i.e . this method can reduce the connection but it resulted in the following error "asyncssh.sftp.SFTPFailure: Too many open files in this session, maximum 100". as per the blog ```https://repost.aws/questions/QUylHqUiMTS2-IRcl5tGbknw/aws-sftp-error-too-many-open-files-in-this-session-maximum-100``` we can't keep more than 100 connections open. hence i have to go with above approach. – Tim Aug 16 '23 at 16:54

0 Answers0