2

I am trying to scrape data from sites such as

http://brochurer.ford.dk/Prisliste/Finansiering/FordPrivatleasing/

and

http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/

But I don´t know wether it´s possible to mimic the request and get the wanted data returned.

I can see 2 request URLs http://brochurer.ford.dk/Prisliste/Finansiering/FordPrivatleasing/Modules/Statistics/Statistics.asmx/RegisterData

http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistics/Statistics.asmx/RegisterData

But can´t see any response in developer tools and my spider gives me an error:

2017-01-10 09:59:27 [scrapy] DEBUG: Retrying <GET http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistics/Statistics.asmx/RegisterData> (failed 1 times): 500 Internal Server Error
2017-01-10 09:59:27 [scrapy] DEBUG: Retrying <GET http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistics/Statistics.asmx/RegisterData> (failed 2 times): 500 Internal Server Error
2017-01-10 09:59:27 [scrapy] DEBUG: Gave up retrying <GET http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistic s/Statistics.asmx/RegisterData> (failed 3 times): 500 Internal Server Error
2017-01-10 09:59:27 [scrapy] DEBUG: Crawled (500) <GET http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistic    s/Statistics.asmx/RegisterData> (referer: None)
2017-01-10 09:59:27 [scrapy] DEBUG: Ignoring response <500 http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistics/Statistics.asmx/RegisterData>: HTTP status code is not handled or not allowed

Is is possible to get the desired data using a combination of Selenium and Scrapy?

Frank
  • 197
  • 1
  • 2
  • 14
  • What is this "wanted data" that you want? – Rafael Almeida Jan 10 '17 at 11:20
  • @Rafael Almeida, using Fiat as an example - Basically I want to make an object from each row of the document. I´m looking for each vehicle name eg. "500 0.9 60 hk Star" and then the associated prices in the same row - for example the column "mdl. ydelse" (monthly cost). – Frank Jan 10 '17 at 11:36
  • But it's an image, you can't scrape that without having some kind of image recognition software – Rafael Almeida Jan 10 '17 at 11:54
  • @RafaelAlmeida - yes - but these images are being populated by some data requests eg http://docs.fiat.dk/Prislister/fiatprivatleasingprisliste/Modules/Statistics/Statistics.asmx/RegisterData So my question is whether I can mimic these requests and parse the data somehow? Perhaps something like this: http://stackoverflow.com/questions/33759652/how-to-scrape-data-from-asmx-web-service-generated-page – Frank Jan 10 '17 at 14:47

0 Answers0