I am trying to get the response.body of https://www.wickedfire.com/ in scrapy shell. but in the response.body it tells me:
<html><title>You are being redirected...</title>\n<noscript>Javascript is required. Please enable javascript before you are allowed to see this page...
How do i activate the javascript? Or is there something else that i can do?
Thank you in advance
UPDATE: i ve installed pip install scrapy-splash and i put those commands in settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
SPLASH_URL = 'http://localhost:8050/'
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
It does give me an error:
NameError: Module 'scrapy_splash' doesn't define any object named 'SplashCoockiesMiddleware'
I have put it as a comment after that error.and it passed.
And my script is like this...but it doesn't work
...
from scrapy_splash import SplashRequest
...
start_urls = ['https://www.wickedfire.com/login.php?do=login']
payload = {'vb_login_username':'','vb_login_password':''}
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse,args={'wait':1})
def parse(self, response):
# url = "https://www.wickedfire.com/login.php?do=login"
r = SplashFormRequest(response,formdata=payload,callback=self.after_login)
return r
def after_login(self,response):
print response.body + "THIS IS THE BODY"
if "incorrect" in response.body:
self.logger.error("Login failed")
return
else:
results = FormRequest.from_response(response,
formdata={'query': 'bitter'},
callback=self.parse_page)
return results
...
This is the error that i get:
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 1 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 2 times): 502 Bad Gateway
[scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://wickedfire.com/ via http://localhost:8050/render.html> (failed 3 times): 502 Bad Gateway
[scrapy.core.engine] DEBUG: Crawled (502) <GET https://wickedfire.com/ via http://localhost:8050/render.html> (referer: None) ['partial']
[scrapy.spidermiddlewares.httperror] INFO: Ignoring response <502 https://wickedfire.com/>: HTTP status code is not handled or not allowed
i also tried scrapy splash with scrapy shell using this Guide
I just want to login to the page and put in a keyword to be search and get the results. This is my end results.