0

I wish to download data for thousands of records from a government site using Python 2.7. One example of a record is http://camara.cl/pley/pley_detalle.aspx?prmID=1252&prmBL=1-07. Two related problems:

(1) the site relies on mouse clicks (in the source: <a href="javascript:__doPostBack(&#39;ctl00$mainPlaceHolder$btnUrgencias&#39;,&#39;&#39;)">Urgencias</a> to access another part of the data of interest to me; and

(2) I am illiterate in web scraping in general and Python in particular.

Learning-by-doing has so far taken me about half-way. Internet resources here, here, and here pushed me in the right direction. But I hit a wall.

I can get source code for the information that fills the screen when the url is invoked.

import requests
id = '1252'
bl = '1-07'
url = 'http://camara.cl/pley/pley_detalle.aspx'
parametros = {'prmID': id, 'prmBL': bl}

r = requests.get(url, params = parametros)
hitos = r.text
print hitos

But I've had no success in getting info from the 'Urgencias' tab. One attempt looks thus

import json
parametros = {'prmID': id, 'prmBL': bl, '__EVENTTARGET': 'ctl00$mainPlaceHolder$btnUrgencias'}
headers = {'content-type': 'application/x-www-form-urlencoded; charset=utf-8'}

p = requests.post(url, data = json.dumps(parametros), headers = headers)
urgencias = p.text
print urgencias

I am obviously not building/sending the request properly. (I am also missing cookies, I believe.)

Any help will be greatly appreciated. (Am open to use any method that will work from a Ubuntu machine!)

Community
  • 1
  • 1
emagar
  • 985
  • 2
  • 14
  • 28
  • 1
    Before you spend too much time on scraping, have you checked thoroughly that there is no official API or FTP site or the possibility of one being provided if you ask nicely? I don't mean to be condescending just offering up the possibility of a much easier option which may have been overlooked. – Stephen Kennedy Oct 22 '14 at 21:38
  • Thank you @StephenKennedy for your answer. I overlooked that option, past experience with Latin American legislatures' web pages has led me to take the absence of such user-friendly resources for granted. The site map in question offers no clue as to the existence of an official API of FTP. Any hints on how to determine their presence? – emagar Oct 23 '14 at 01:22
  • 1
    I don't speak Spanish so can't really help there, but you could always drop them an email or call them. You might even be able to get a DVD. Chances are they will say no but imho its worth the few minutes to find out before you spend hours or days on a brute force solution. Good luck either way. – Stephen Kennedy Oct 23 '14 at 09:13
  • Have done so. You have made me reconsider my prejudices... I'll let you know if it comes to something. – emagar Oct 23 '14 at 15:11
  • 2
    Three mos later, I can report with confidence to @StephenKennedy that my friendly request was ignored by the congressional staff. They have better ways to spend their time than helping a foreign social scientist --- can't blame them. But I was able to scrape the site succesfully, will report on that in a separate post. Cheers! – emagar Feb 05 '15 at 19:57
  • 1
    @emagar did you ever write up a post on how you solved this? would love to see what worked – maddie May 01 '17 at 02:37

0 Answers0