scrapy shell enable javascript

Question

I am trying to get the response.body of https://www.wickedfire.com/ in scrapy shell. but in the response.body it tells me:

<html><title>You are being redirected...</title>\n<noscript>Javascript is required. Please enable javascript before you are allowed to see this page...

How do i activate the javascript? Or is there something else that i can do?

Thank you in advance

UPDATE: i ve installed pip install scrapy-splash and i put those commands in settings.py

DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

SPLASH_URL = 'http://localhost:8050/'

SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}

DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

It does give me an error:

NameError: Module 'scrapy_splash' doesn't define any object named 'SplashCoockiesMiddleware'

I have put it as a comment after that error.and it passed.

And my script is like this...but it doesn't work

...
from scrapy_splash import SplashRequest
...

        start_urls = ['https://www.wickedfire.com/login.php?do=login']

        payload = {'vb_login_username':'','vb_login_password':''}

        def start_requests(self):
                for url in self.start_urls:
                        yield SplashRequest(url, self.parse,args={'wait':1})


        def parse(self, response):
#               url = "https://www.wickedfire.com/login.php?do=login"
                r = SplashFormRequest(response,formdata=payload,callback=self.after_login)
                return r

        def after_login(self,response):
                print response.body + "THIS IS THE BODY"
                if "incorrect" in response.body:
                        self.logger.error("Login failed")
                        return
                else:

                       results = FormRequest.from_response(response,
                                                formdata={'query': 'bitter'},
                                                callback=self.parse_page)
                        return results

...

This is the error that i get:

 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 2 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 3 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (502) <GET https://wickedfire.com/ via http://localhost:8050/render.html> (referer: None) ['partial']
[scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://wickedfire.com/>: HTTP status code is not handled or not allowed

i also tried scrapy splash with scrapy shell using this Guide

I just want to login to the page and put in a keyword to be search and get the results. This is my end results.

Scrapy does not interpret JavaScript. You'll need a JavaScript renderer at least, like [Splash](https://splash.readthedocs.io/en/stable/) for example. — paul trmbrth, Aug 24 '17 at 12:58
Or you need to use Selenium or a Browser Extension for scraping. Scrapy will not work without integrating to Splash — Tarun Lalwani, Aug 24 '17 at 13:16
thank you...i was thinking of selenium but i cant use a browser on a server. i will try splash hope it works. — Omega, Aug 24 '17 at 13:20
Error message shows that you have a typo in the midlleware name. — Mikhail Korobov, Sep 01 '17 at 10:13

scrapy shell enable javascript

0 Answers0