signal only works in main thread: scrappy

Question

I am making an api which return the JsonResponse as my text from the scrapy. When i run the scripts individually it runs perfectly. But when i try to integrate the scrapy script with python django i am not getting the output.

What i want is only return the response to the request(which in my case is POSTMAN POST request.

Here is the code which i am trying

from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
import scrapy
from scrapy.crawler import CrawlerProcess


@csrf_exempt
def some_view(request, username):
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
        'LOG_ENABLED': 'false'
    })
    process_test = process.crawl(QuotesSpider)
    process.start()

    return JsonResponse({'return': process_test})


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/random',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        return response.css('.text::text').extract_first()

I am very new to python and django stuff.Any kind of help would be much appreciated.

I don't use those libraries but hopefully I can help on the more "general" Python. 1- What does `process_test` do in `process_test = process.crawl(QuotesSpider)`? In Python it's ok to not assign a return value to anything. 2- I'm tempted to say try with an instance of the class, so like this: `process.crawl(QuotesSpider())`. — Guimoute, Oct 17 '18 at 12:47
@Guimoute `process_test` is returning the json response to my request. and making this an instance not helped me much — fat potato, Oct 17 '18 at 12:54

score 0 · Answer 1 · answered Oct 28 '18 at 02:11

In your code, process_test is a CrawlerProcess, not the output from the crawling.

You need additional configuration to make your spider store its output "somewhere". See this SO Q&A about writing a custom pipeline.

If you just want to synchronously retrieve and parse a single page, you may be better off using requests to retrieve the page, and parsel to parse it.

signal only works in main thread: scrappy

1 Answers1