0

Editing:

So, I am doing a webscraping using beautiful soup.

I´ve tried a lot of things but can´t reach this part of the code:

enter image description here

I tried this (and other derivations) but it returns an empty list:

iptu = [iptu.get_text() for iptu in soup.find_all("article", {"data-clickstream":"iptuPrices"})]

How can I send the HTML as its very big to copy and paste?!

Ferby
  • 13
  • 5
  • You'll need to provide some more context on it please. Are you using BeautifulSoup on this one? Also see https://stackoverflow.com/questions/2612548/extracting-an-attribute-value-with-beautifulsoup for a potential solution. – jayg_code Nov 06 '19 at 23:12
  • Please provide the html/link. – Sin Han Jinn Nov 06 '19 at 23:13
  • Here is the link: https://www.zapimoveis.com.br/aluguel/casas-de-condominio/agr+rj++barra-e-recreio/ But this link changes the HTML aleatory!! – Ferby Nov 08 '19 at 20:34

1 Answers1

1

From your image, it looks like the data you want is in a JSON string in an attribute of the article tag. If so, then perhaps something like this can get you started.

from bs4 import BeautifulSoup
import json
import requests

url = 'https://www.zapimoveis.com.br/aluguel/casas-de-condominio/agr+rj++barra-e-recreio/'

user_agent = {'User-agent': 'Mozilla/5.0'}
resp = requests.get(url, headers=user_agent)

soup = BeautifulSoup(resp.text, features="html.parser")

prices = []
for i, a in enumerate(soup.find_all('article')):
    b = a.get('data-clickstream')
    if not b: continue
    o = json.loads(b)
    prices.append(sum(map(float, o['iptuPrices'])))

print(prices)
  • thanks for helping! I tried but did not work out.. ! How can I spot the difference between Json and Html ? – Ferby Nov 08 '19 at 21:29
  • Edited my answer to read the data from the URL given. Here's a [working demo](https://repl.it/repls/OilyMealyArgument) –  Nov 09 '19 at 23:57