I am porting a simple python 3 script to AWS Lambda. The script is simple: it gathers information from a dozen of S3 objects and returns the results.
The script used multiprocessing.Pool
to gather all the files in parallel. Though multiprocessing
cannot be used in an AWS Lambda environment since /dev/shm
is missing.
So I thought instead of writing a dirty multiprocessing.Process
/ multiprocessing.Queue
replacement, I would try asyncio
instead.
I am using the latest version of aioboto3
(8.0.5) on Python 3.8.
My problem is that I cannot seem to gain any improvement between a naive sequential download of the files, and an asyncio event loop multiplexing the downloads.
Here are the two versions of my code.
import sys
import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import boto3
import aioboto3
BUCKET = 'some-bucket'
KEYS = [
'some/key/1',
[...]
'some/key/10',
]
async def download_aio():
"""Concurrent download of all objects from S3"""
async with aioboto3.client('s3') as s3:
objects = [s3.get_object(Bucket=BUCKET, Key=k) for k in KEYS]
objects = await asyncio.gather(*objects)
buffers = await asyncio.gather(*[o['Body'].read() for o in objects])
def download():
"""Sequentially download all objects from S3"""
s3 = boto3.client('s3')
for key in KEYS:
object = s3.get_object(Bucket=BUCKET, Key=key)
object['Body'].read()
def run_sequential():
download()
def run_concurrent():
loop = asyncio.get_event_loop()
#loop.set_default_executor(ProcessPoolExecutor(10))
#loop.set_default_executor(ThreadPoolExecutor(10))
loop.run_until_complete(download_aio())
The timing for both run_sequential()
and run_concurrent()
are quite similar (~3 seconds for a dozen of 10MB files).
I am convinced the concurrent version is not, for multiple reasons:
- I tried switching to
Process/ThreadPoolExecutor
, and I the processes/threads spawned for the duration of the function, though they are doing nothing - The timing between sequential and concurrent is very close to the same, though my network interface is definitely not saturated, and the CPU is not bound either
- The time taken by the concurrent version increases linearly with the number of files.
I am sure something is missing, but I just can't wrap my head around what.
Any ideas?