Python 3.6 - Scrapy 1.5
I'm scraping the John Deere warranty webpage to watch all new PMP's and its expiration date. Looking inside network communication between browser and webpage I found a REST API that feed data in webpage.
Now, I'm trying to get json data from API rather scraping the javascript page's content. However, I'm getting a Internal Server Error and I don't know why.
I'm using scrapy to log in and catch data.
import scrapy
class PmpSpider(scrapy.Spider):
name = 'pmp'
start_urls = ['https://jdwarrantysystem.deere.com/portal/']
def parse(self, response):
self.log('***Form Request***')
login ={
'USERNAME':*******,
'PASSWORD':*******
}
yield scrapy.FormRequest.from_response(
response,
url = 'https://registration.deere.com/servlet/com.deere.u90950.registrationlogin.view.servlets.SignInServlet',
method = 'POST', formdata = login, callback = self.parse_pmp
)
self.log('***PARSE LOGIN***')
def parse_pmp(self, response):
self.log('***PARSE PMP***')
cookies = response.headers.getlist('Set-Cookie')
for cookie in cookies:
cookie = cookie.decode('utf-8')
self.log(cookie)
cook = cookie.split(';')[0].split('=')[1]
path = cookie.split(';')[1].split('=')[1]
domain = cookie.split(';')[2].split('=')[1]
yield scrapy.Request(
url = 'https://jdwarrantysystem.deere.com/api/pip-products/collection',
method = 'POST',
cookies = {
'SESSION':cook,
'path':path,
'domain':domain
},
headers = {
"Accept":"application/json",
"accounts":["201445","201264","201167","201342","201341","201221"],
"excludedPin":"",
"export":"",
"language":"",
"metric":"Y",
"pipFilter":"OPEN",
"pipType":["MALF","SAFT"]
},
meta = {'dont_redirect': True},
callback = self.parse_pmp_list
)
def parse_pmp_list(self, response):
self.log('***LISTA PMP***')
self.log(response.body)
Why am I getting an error? How to get data from this API?
2018-07-05 17:26:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (failed 1 times): 500 Internal Server Error
2018-07-05 17:26:20 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (failed 2 times): 500 Internal Server Error
2018-07-05 17:26:21 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (failed 3 times): 500 Internal Server Error
2018-07-05 17:26:21 [scrapy.core.engine] DEBUG: Crawled (500) <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (referer: https://jdwarrantysystem.deere.com/portal/)
2018-07-05 17:26:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 https://jdwarrantysystem.deere.com/api/pip-products/collection>: HTTP status code is not handled or not allowed