Invoking scrapy in runtime environments like cgi, wsgi

Question

I want to extract the information from a webpage on demand.

User will submit a url in a form. I want to extract the info from the page and render it to the user.

I want to simulate

scrapy shell "my_url"

in my run time code.

Is there some scrapy utility that given a url which gives me HtmlXPathSelector (hxs) object?

Are you required to use `scrapy`? What about `requests` + `lxml` solution (`lxml` supports xpath)? — alecxe, Jun 24 '13 at 18:45
I already have a scrapy project that crawls and parses for many requests. I want to re-use it to one request now. — nizam.sp, Jun 24 '13 at 18:55
What about making a separate spider, define `start_urls=[url]` that the user submitted and start the spider via `reactor.run` (like [here](http://stackoverflow.com/questions/13437402/how-to-run-scrapy-from-within-a-python-script))? — alecxe, Jun 24 '13 at 19:02

score 0 · Answer 1 · answered Oct 14 '15 at 18:16

The crawlers can run as normal python script. This can be done by running from the project path, where we have the scrapy.cfg file. (just path it has nothing to do with scrapy.cfg )

from twisted.internet import reactor

from scrapy.crawler import Crawler

from scrapy import log, signals

create a Obj with SpiderName spider = SampleSpider(arguments_you_want_intialize_tht_object)

settings = get_project_settings() crawler = Crawler(settings) crawler.signals.connect(reactor.stop, signal=signals.spider_closed) crawler.configure() crawler.crawl(spider) crawler.start() reactor.run()

We can add this code in the CGI Script to invoke the Spider.

Invoking scrapy in runtime environments like cgi, wsgi

1 Answers1