8

I am familiar with BeautifulSoup and urllib2 to scrape data from a webpage. However, what if a parameter needs to be entered into the page before the result that I want to scrape is returned?

I'm trying to obtain the geographic distance between two addresses using this website: http://www.freemaptools.com/how-far-is-it-between.htm

I want to be able to go to the page, enter two addresses, click "Show", and then extract the "Distance as the Crow Flies" and "Distance by Land Transport" values and save them to a dictionary.

Is there any way to input data into a webpage using Python?

user728166
  • 247
  • 1
  • 3
  • 10
  • 2
    This isn't answering your question, but rather your problem. I used a Firefox extension called HttpFox to figure out what the website did in order to calculate the distance and it turns out it uses the Google Maps API. You can use it for free; please see: http://code.google.com/apis/maps/documentation/directions/. For example, execute the following in a shell prompt on Linux to get the JSON directions: curl "http://maps.googleapis.com/maps/api/directions/json?origin=london&destination=bristol&sensor=false" – Asim Ihsan Aug 13 '11 at 01:58

5 Answers5

1

I think you can also use PySide/PyQt, because they have a browser core of qtwebkit, you can control the browser to open pages, simulate human actions(fill, click...), then scrape data from pages. FMiner is work on this way, it's a web scraping software I developed with PySide.

Or you can try phantomjs, it's an easy library to control browser, but not it's javascript not python lanuage.

user2647646
  • 101
  • 5
1

Yes! Try mechanize for this kind of Web screen-scraping task.

Tim Smith
  • 6,127
  • 1
  • 26
  • 32
0

In addition with the answers already given, you could simply do a request on that page. Using your browser you could always inspect the Network (under Tools/Web Developer tools) behaviors and actions when you interact with the page. E.g. http://www.freemaptools.com/ajax/getaandb.php?a=Florida_Usa&b=New%20York_Usa&c=6052 -> request query for getting the results page you are expecting. Request that page and scrape the field you wanted to. IMHO, page requests are way faster than screen scraping (case-to-case basis).

But of course, you could always do screen scraping/browser simulation also (Mechanize, Splinter) and use headless browsers (PhantomJS, etc.) or the browser driver of the browser you want to use.

aldnav
  • 182
  • 10
0

The query may have been resolved.

You can use Selenium WebDriver for this purpose. A web page can be interacted using programming language. All the operations can be performed as if a human user is accessing the web page.

r_D
  • 578
  • 1
  • 7
  • 16