3

I have the following code in aws lambda to get response from an API until the status is complete. I have used the ThreadPoolExecutor from concurrent.futures.

Here is the sample code.

import requests
import json
import concurrent.futures

def copy_url(headers,data):
   collectionStatus = 'INITIATED'
   retries = 0
   print(" The data to be copied is ",data)
   while (collectionStatus != 'COMPLETED' or retries <= 50):
       r = requests.post(
              url=URL,
              headers=headers,
              data=json.dumps(data))
       final_status= r.json().get('status').pop().get('status')
       retries += 1
       print(" The collection status is",final_status)


with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    future = executor.submit(copy_url,headers,data)
    return_value = future.result()

I had already implemented this using regular threads in python. However, since I wanted a return value from the thread tried implementing this. Though this works perfectly in pycharm, it always throws a timeout error in aws lambda.

Could someone please explain why this happens only in aws-lambda?

Note : I have already tried increasing the lambda timeout value. This happens only when threadpoolexecutor is implemented. When I comment out that code it works fine.Also it works fine with the regular python thread implementation

DineshKumar
  • 1,599
  • 2
  • 16
  • 30
  • 1
    thanks for documenting this problem - faced the same issue. – ikamen Oct 17 '20 at 20:16
  • My issue was resolved after increasing timeout from 3 seconds to 10 seconds - it took a long time first time this new code ran, but consequent runs were under 3 seconds. Without letting it complete once, it was always timing out. – Robert McPythons Oct 19 '20 at 10:43
  • @RobertMcPythons Thanks for sharing. In my case, the API response took longer time (around 10 minutes) to complete the execution which I cant afford as I have to send a response back to API gateway connected to my lambda. That's when I tried mulitprocessing which didn't work either for the reason I have shared in my answer.So I had to change the design. – DineshKumar Oct 19 '20 at 13:22

2 Answers2

2

Finally, I changed the implementation to listening to a SQS trigger rather than waiting for the response from an API (The API is handled by a different component and response will take a significant amount of time)

Looks like we should avoid using parallel processing tasks with python in aws lambda.

From the AWS docs:

The multiprocessing module that comes with Python lets you run multiple processes in parallel. Due to the Lambda execution environment not having /dev/shm (shared memory for processes) support, you can’t use multiprocessing.Queue or multiprocessing.Pool.

If multiprocessing ought to be used, only PIPE is supported.

DineshKumar
  • 1,599
  • 2
  • 16
  • 30
0

The question was related to multithread execution, but the AWS documentation listed on the answer is related to multiprocessing, they are different implementations.

  • Multiprocess will open a new child process to execute the operation
  • Multithread will create a new thread on the same process to execute the operation.

More information on this answer: Multiprocessing vs Threading Python

balq
  • 21
  • 4