Can´t reach to this specific CSS / HTML tag Python web scraping

Question

Editing:

So, I am doing a webscraping using beautiful soup.

I´ve tried a lot of things but can´t reach this part of the code:

enter image description here

I tried this (and other derivations) but it returns an empty list:

iptu = [iptu.get_text() for iptu in soup.find_all("article", {"data-clickstream":"iptuPrices"})]

How can I send the HTML as its very big to copy and paste?!

You'll need to provide some more context on it please. Are you using BeautifulSoup on this one? Also see https://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup for a potential solution. — jayg_code, Nov 06 '19 at 23:12
Here is the link: https://www.zapimoveis.com.br/aluguel/casas-de-condominio/agr+rj++barra-e-recreio/ But this link changes the HTML aleatory!! — Ferby, Nov 08 '19 at 20:34

score 1 · Answer 1 · 2019-11-09T23:56:54.057

1

From your image, it looks like the data you want is in a JSON string in an attribute of the article tag. If so, then perhaps something like this can get you started.

from bs4 import BeautifulSoup
import json
import requests

url = 'https://www.zapimoveis.com.br/aluguel/casas-de-condominio/agr+rj++barra-e-recreio/'

user_agent = {'User-agent': 'Mozilla/5.0'}
resp = requests.get(url, headers=user_agent)

soup = BeautifulSoup(resp.text, features="html.parser")

prices = []
for i, a in enumerate(soup.find_all('article')):
    b = a.get('data-clickstream')
    if not b: continue
    o = json.loads(b)
    prices.append(sum(map(float, o['iptuPrices'])))

print(prices)

edited Nov 09 '19 at 23:56

answered Nov 06 '19 at 23:28

thanks for helping! I tried but did not work out.. ! How can I spot the difference between Json and Html ? – Ferby Nov 08 '19 at 21:29
Edited my answer to read the data from the URL given. Here's a [working demo](https://repl.it/repls/OilyMealyArgument) – Nov 09 '19 at 23:57

Can´t reach to this specific CSS / HTML tag Python web scraping

1 Answers1