5

I have a relatively simple linear regression lambda in AWS. Each instance the function is called the logs display the following:

/opt/python/sklearn/externals/joblib/_multiprocessing_helpers.py:38: UserWarning: [Errno 38] Function not implemented. joblib will operate in serial mode
warnings.warn('%s. joblib will operate in serial mode' % (e,))

I suspect this is due to sklearn running on a lambda (i.e. 'serverless') and trying to determine it's multi-processing capabilities as per this question and this GH issue.

I am also understanding from the GH that this is not a 'fixable' issue, it will always happen when deploying with these dependencies on this hardware. I am getting back my expected results (even though I am currently maxing out the default, minimum lambda memory of 128mb).

I aim to control the warnings and would know if there is a way to either:

  • stop sklearn looking for multiprocessing, so preventing the warning from issuing
  • capture this specific warning and prevent it from being passed from my function into the cloudwatch logs
  • if both are possible, which would be preferable from a aws architecture/pythonic opinion?
DaveRGP
  • 1,430
  • 15
  • 34

3 Answers3

2

To capture the warning and prevent it from being passed into the cloudwatch logs, you can filter the warning as follows.

import json
import warnings
warnings.filterwarnings('error') 
try:
    import sklearn
except Warning:
    pass 

def lambda_handler(event, context):
    # TODO implement
    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

The article here, particularly towards the end, recreates and filters the warning.

sedeh
  • 7,083
  • 6
  • 48
  • 65
2

None of the suggested solutions worked for me. Digging into the source code of joblib here: https://github.com/joblib/joblib/blob/master/joblib/_multiprocessing_helpers.py, I discovered the environment variable JOBLIB_MULTIPROCESSING which seems to control whether joblib attempts to use multiprocessing.

Setting this to 0 solved the problem for me.

1

Now, we can use larger memory for AWS Lambda, up to about 10GB. I faced the same problem, and I set up 10GB of memory and then fixed it. (Actually, my program used 248MB memory.) I don't know why small memory caused the joblib problem by importing sklearn though.

cowkami
  • 11
  • 1
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 23 '22 at 06:29
  • aws lambda scales cpus based on how much memory you assign. So if you had a low amount of memory initially, you may have only had one cpu available. Whereas with 10GB you should get 6 cpus – Arran Duff Aug 24 '23 at 11:22