1

I'm trying to get this number (circled in red), from this website https://www.banxico.org.mx/: enter image description here

And i have this code to get it but I get an empty list:

linktc='https://www.banxico.org.mx/'
pagetc=requests.get(linktc)
tree=html.fromstring(pagetc.content)
tipocambio=tree.xpath('//div[@id="vFIX"]//span[@class="valor"]//text()')
print("TC: ",tipocambio)

Does someone knows what's the problem?

Luis
  • 53
  • 4

2 Answers2

2

The issue here, is that you need a capable library. The value you would like is generated with JS.

You can instead use via :

const puppeteer = require('puppeteer');
const fs = require('fs');
const debug = true;

(async () => {
    const browser = await puppeteer.launch({
        headless: true,
    });

    const page = await browser.newPage();

    // UA
    await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0')

    // open main URL
    await page.goto('https://www.banxico.org.mx/', { waitUntil: 'networkidle2' });

    // wait for wanted selector to pop up
    await page.waitForXPath('//div[@id="vFIX"]//span[@class="valor"]');

    // retrieve text content
    var element = await page.$x('//div[@id="vFIX"]//span[@class="valor"]/text()');
    let text = await page.evaluate(element => element.textContent, element[0]);

    console.log(text);

    await browser.close();
})();

Output

22.6662

Or check Web-scraping JavaScript page with Python too

Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
  • 1
    [Accepted answer](https://stackoverflow.com/a/62565881/290085) is a fine work-around (+1 too), but this is technique is more generally applicable to sites with JavaScript-generated output. – kjhughes Jun 25 '20 at 00:17
  • 1
    Yes, agree both points, not all the times you have JSON (or random data) accessible like this. This solution is more generic, usable in all situations I know – Gilles Quénot Jun 25 '20 at 00:20
2

Javascript is needed to display the value. You could use Selenium to get it. Or retrieve the data directly from the JSON loaded in the background :

import urllib.request, json 
with urllib.request.urlopen("https://www.banxico.org.mx/canales/singleFix.json") as url:
    data = json.loads(url.read().decode())
    print(data['valor'])

Output : 22.6662

Alternative : get the value from elsewhere.

from lxml import html
import requests

url = 'https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es'
r = requests.get(url)
tree = html.fromstring(r.content)
value=tree.xpath('//tr[@id="nodo_0_0_0"]/td[7]//td[last()]')[0].text
print(value.strip())

Output : 22.6662

E.Wiest
  • 5,425
  • 2
  • 7
  • 12