How should I treat joblib multiprocessing in an AWS lambda implementation?

Question

I have a relatively simple linear regression lambda in AWS. Each instance the function is called the logs display the following:

/opt/python/sklearn/externals/joblib/_multiprocessing_helpers.py:38: UserWarning: [Errno 38] Function not implemented. joblib will operate in serial mode

warnings.warn('%s. joblib will operate in serial mode' % (e,))

I suspect this is due to sklearn running on a lambda (i.e. 'serverless') and trying to determine it's multi-processing capabilities as per this question and this GH issue.

I am also understanding from the GH that this is not a 'fixable' issue, it will always happen when deploying with these dependencies on this hardware. I am getting back my expected results (even though I am currently maxing out the default, minimum lambda memory of 128mb).

I aim to control the warnings and would know if there is a way to either:

stop sklearn looking for multiprocessing, so preventing the warning from issuing
capture this specific warning and prevent it from being passed from my function into the cloudwatch logs
if both are possible, which would be preferable from a aws architecture/pythonic opinion?

score 2 · Answer 1 · answered Jun 09 '20 at 05:28

To capture the warning and prevent it from being passed into the cloudwatch logs, you can filter the warning as follows.

import json
import warnings
warnings.filterwarnings('error') 
try:
    import sklearn
except Warning:
    pass 

def lambda_handler(event, context):
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

The article here, particularly towards the end, recreates and filters the warning.

score 2 · Answer 2 · answered Dec 09 '22 at 12:36

None of the suggested solutions worked for me. Digging into the source code of joblib here: https://github.com/joblib/joblib/blob/master/joblib/_multiprocessing_helpers.py, I discovered the environment variable JOBLIB_MULTIPROCESSING which seems to control whether joblib attempts to use multiprocessing.

Setting this to 0 solved the problem for me.

score 1 · Answer 3 · answered May 23 '22 at 04:33

1

Now, we can use larger memory for AWS Lambda, up to about 10GB. I faced the same problem, and I set up 10GB of memory and then fixed it. (Actually, my program used 248MB memory.) I don't know why small memory caused the joblib problem by importing sklearn though.

answered May 23 '22 at 04:33

cowkami

11
1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 23 '22 at 06:29
aws lambda scales cpus based on how much memory you assign. So if you had a low amount of memory initially, you may have only had one cpu available. Whereas with 10GB you should get 6 cpus – Arran Duff Aug 24 '23 at 11:22

How should I treat joblib multiprocessing in an AWS lambda implementation?

3 Answers3

Linked