3

I'm using bs4 and urllib2 to fetch some info from a website.

Here's the webpage.

I must fetch the rest of the telephone 3610...... but first I must press this button to show the rest of the telephone.

img example

This information is located inside this div:

<div class="telefones">
        Telefone(s): <span id="telefones">3610...
        <span><input type="button" id="verTel" value="ver telefone completo"/></span></span>
</div>

Is it possible to achieve this by using bs4 with urllib2?

dot.Py
  • 5,007
  • 5
  • 31
  • 52
  • 1
    You may be interested in: http://stackoverflow.com/questions/12756443/fill-and-submit-html-form – Bakuriu Jul 01 '16 at 19:55
  • @Bakuriu, thanks! gonna take a look. – dot.Py Jul 01 '16 at 19:57
  • 1
    The answer would be very much specific to a particular webpage - could you share the link to the target site if possible? Thanks. – alecxe Jul 01 '16 at 19:58
  • @alecxe, edited my question. tks. – dot.Py Jul 01 '16 at 20:01
  • 1
    It looks like the site is trying to keep people from doing exactly what you're doing by hiding certain elements using JavaScript. The typical ways of circumventing it involves robot browsers, direct API interactions (as opposed to front-end scraping) or de-obfuscation; not sure what's the case here. – jDo Jul 01 '16 at 20:04

2 Answers2

3

The phone number is loaded from the response to the http://www.ribeiraosaude.com.br/home/GetTelefone/<id> url, make this request with requests and extract the phone number from the JSON response:

import requests
from bs4 import BeautifulSoup

page_id = 937
with requests.Session() as session:  # maintaining web-scraping session
    response = session.get("http://www.ribeiraosaude.com.br/detalhe/%d" % page_id)
    soup = BeautifulSoup(response.content, "html.parser")

    phone_number = session.get("http://www.ribeiraosaude.com.br/home/GetTelefone/%d" % page_id).json()["telefone"]
    print(phone_number)
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Hmmm. Thanks for your explanation.. it's a lot clear for me now. So it looks like the rest of the number is loaded after a request. And this request is provided by the javascript code that runs when I press the button with `id=verTel`. Thanks a lot! – dot.Py Jul 01 '16 at 20:11
3

How you approach this depends on what happens when the button is clicked.

If the button triggers JavaScript that displays the number you can scrape the JavaScript called by the button.

Eg. function displayNumber(){ document.body.yourSpan.innerHTML = 'NUMBER'

However, if the button causes an ajax request, you can mimic the action of the page with the fantastic request library built into python to interact with the server directly.

Eg. phone_number = session.get("http://www.ribeiraosaude.com.br/home/GetTelefone/%d" % page_id).json()["telefone"] (Credit: alecxe)

However, reguardless of how the button works, there is one more option. It if also popular to use Selenium which handles the page very similarly to a browser that can be controlled by python. For information on clicking buttons with Selenium, see this answer

Community
  • 1
  • 1
BSL-5
  • 714
  • 5
  • 14
  • Wow... that's a lot of useful info. So I must look at the webpage source to see if the button calls a javascript function or an ajax function? By looking at alecxe answer, I saw that if I had searched for the `button id` I would had find the javascript function... it's the same logic for ajax? – dot.Py Jul 01 '16 at 20:14
  • 1
    Looking at the specific webpage you have, you'll want method two or method three. Ajax is a method of communication between webpages and servers. In this case, the website makes a request to the server to get the telephone number. For your application, it makes sense to cut out the middle man (the button) and make the request directly (alecxe has a good example of this). For a quick intro (1 min) about what ajax is, see [this video by Udacity](https://www.youtube.com/watch?v=P5JlebbqzTQ&noredirect=1) about ajax and what its used for. – BSL-5 Jul 01 '16 at 20:21