0

I run Scrapy from script https://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script to launch a script from AWS Lambda. I compile the project with SAM and everything is correct.

But now, I have the problem with LOG_LEVEL parameter.

def handler(event, context):

  settings = {
             'USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.85 Safari/537.36',
             'LOG_ENABLED': True,
             'LOG_LEVEL': 'ERROR'
          }

  process = CrawlerProcess(settings=settings)
  process.crawl(Spider)
  process.start()

When execute this code in local, all is correct, only receive the LOG_LEVEL: ERROR, but when execute this code in AWS Lambda, I receive the LOG_LEVEL: DEBUG, and I don´t know how to resolve.

nicopasan
  • 1
  • 3
  • The python environment in Lambda has a preconfigured root logger. I think scrapy might be clashing with that, but I'm not 100% sure. Can you try configuring the scrappy logging manually and use `scrapy.utils.log.configure_logging(install_root_handler=False)` to see if it helps? – Milan Cermak Jan 21 '19 at 18:20
  • @MilanCermak I tried these configuration but doesn´t work. I put the line before the settings dict, inside Lambda handler. It is correct? – nicopasan Jan 22 '19 at 08:44
  • Yes, that should be ok. What is weird to me is that you get even DEBUG messages. Something is definitely "messing" with the logging setup. Maybe try one more thing - in the top level (outside of the Lambda handler), get the root logger `root = logging.getLogger()` and call the scrappy `configure_logging` with your settings. HTH. – Milan Cermak Jan 22 '19 at 13:10
  • @MilanCermak I tried with `root = logging.getLogger()` outside of the Lambda handler and `scrapy.utils.log.configure_logging(settings=settings)` (with install_root_handler True and False) after settings dict and is the same behaviour. It still showing the DEBUG logs. :( Any idea? – nicopasan Jan 22 '19 at 17:13
  • Whoops, sorry, somehow, my comment above missed the main part I wanted to convey - remove all the handlers from the root logger first, before calling the configure function. – Milan Cermak Jan 22 '19 at 17:18
  • @MilanCermak, can you provide more information? I don't understand your last comment. Thanks for all. – nicopasan Jan 22 '19 at 17:27
  • Try something along these lines https://gist.github.com/milancermak/945b54107d6c238ac079ea2fcd39be29 – Milan Cermak Jan 22 '19 at 18:51
  • Milan, feel free to provide an answer (even copy mine), I’ll delete mine once you do. – Gallaecio Jan 24 '19 at 08:41

1 Answers1

2

Based on the input from the OP in Scrapy issue #3587, it turns out AWS Lambda installs its own handlers on the root logger, so you need to remove those handlers before you use Scrapy:

from logging import getLogger

getLogger().handlers = []

def handler(event, context):  # AWS Lambda entry point
    pass  # Your code to call Scrapy.
Gallaecio
  • 3,620
  • 2
  • 25
  • 64
  • 1
    Yes, the OP of Scrapy issue is mine, based in @Milan Cermak help. I am waiting that he posted the answer. But your response is correct too. Thanks. – nicopasan Jan 24 '19 at 08:26
  • Hi @Gallaecio, I had another problem with AWS Lambda container and Scrapy. When I execute the code in local, doesn't fail but when execute the code in AWS Lambda containers two times in a short period of time, it produce this error: I put in the gist: https://gist.github.com/milancermak/945b54107d6c238ac079ea2fcd39be29 Thanks for help! – nicopasan Jan 24 '19 at 12:01
  • Please, open a separate question so that it can be found easily, both by people who know the answer and by people how may have the same question. – Gallaecio Jan 24 '19 at 13:04
  • You are right, this is the new issue: https://stackoverflow.com/questions/54350888/runtime-error-on-aws-lambda-with-scrapy-reuse-container-issue , thanks for all @Gallaecio – nicopasan Jan 24 '19 at 16:10
  • Hey folks. No worries @nicoparsa, feel free to upvote and accept this answer. Glad we found a solution. – Milan Cermak Jan 24 '19 at 16:27
  • @MilanCermak Thank you so much for all! – nicopasan Jan 24 '19 at 16:34