I want to build a crawler which takes the URL of a webpage to be scraped and returns the result back to a webpage. Right now I start scrapy from the terminal and store the response in a file. How can I start the crawler when some input is posted on to Flask, process, and return a response back?
Asked
Active
Viewed 1,772 times
5
-
Sorry, that last line is a little fuzzy. What are you doing with Flask? What process? And return the response back to where? – nivix zixer Jul 24 '15 at 04:01
-
I'm using FLASK to expose the endpoints, so that from a web-app someone can post an input i.e. the webpage link to be scraped. Then, I want to start the spider and pass that input and return the crawler response back to web-app. – Ashish Jul 24 '15 at 04:06
-
I just answered similar question here: https://stackoverflow.com/questions/36384286/how-to-integrate-flask-scrapy – Pawel Miech May 17 '16 at 08:14
1 Answers
3
You need to create a CrawlerProcess inside your Flask application and run the crawl programmatically. See the docs.
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
...
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # The script will block here until the crawl is finished
Before moving on with your project I advise you to look into a Python task queue (like rq). This will allow you to run Scrapy crawls in the background and your Flask application will not freeze while the scrapes are running.

nivix zixer
- 1,611
- 1
- 13
- 19
-
I have used it under scrapy. Will you please provide some code snippet, which is running spider under flask application???? – Vasim Aug 12 '15 at 09:33