I'm currently trying to get scrapy to run in Google Cloud Function.
from flask import escape
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
def hello_http(request):
settings = get_project_settings()
process = CrawlerProcess(settings)
process.crawl(BlogSpider)
process.start()
return 'Hello {}!'.format(escape("Word"))
This works, but strangely enough, not "all the time".
Every other time, the HTTP call will return an error, then I can read on stack driver:
Function execution took 509 ms, finished with status: 'crash'
I check the spider, even simplified it to something that can't fail such as:
import scrapy
class BlogSpider(scrapy.Spider):
name = 'blogspider'
start_urls = ['https://blog.scrapinghub.com']
def parse(self, response):
yield { 'id': 1 }
Can someone explain to me what's going on?
Could it be a resource quota I'm hitting?