I have googling for two days and can't solve it. Here is my question:
I want to crawl some MAC software from http://macdownload.informer.com/ what i really need is to get the real download link for each software. For example: Enter in http://macdownload.informer.com/basex/download/ ,and then click the download button.
The page will be redirected and the will popup download dialog. Through F12 in the browser,i find respond code is 302 and the file real link in the response hreader['location'].
My question is how can i get the response header in pathon. My python code like this :
response = requests.get('macdownload.informer.com/basex/download/?cf29b90&p555c=1') # i just get reponse code 200 while My expectation is 302
real_download_link = response.headers['location']
print real_download_link
But the result is not correct,what i expect is link as [ files.basex.org/releases/8.2/BaseX82.zip ]
Then I check the download button, and I find ajax operation.So I use selenium to simulate the click opration, And yes ,it works.But I can't get the response header in selenium.
So,Can anyone help me solve the problem. No matter you write in python directly to extract the response header and get the location field. Or use selenium to get the response header. The selenium as follows:
def parse_soft(self,response):
soft_url = response.selector.xpath('//div[contains(@id,"download_content")]/div[2]/a/@href').extract()
try:
self.browser.set_page_load_timeout(15)
self.browser.get(soft_url[0])
except Exception,ex:
print "Excetion! " + str(ex)
self.browser.find_element_by_class_name("download_btn").click()
# TODO: Here i want to get the response header