1

I am trying to findo this "generic" tag (there is only a "Span" Tag). I've tried a lot of things but none of them worked out. I tried the code below but brings me more than I want (I´m trying to reach the "573 m²" only...

Code:

Meters = [headline3.get_text() for headline3 in soup.find_all("ul", {"class": "feature__container"})]

Output:

['\n        573 m²\n        \n        4 \n        \n        4 \n        \n        4 \n        ',

HTML CODE (image): 1: https://i.stack.imgur.com/uX7Ox.jpg

jmunsch
  • 22,771
  • 11
  • 93
  • 114
Ferby
  • 13
  • 5

1 Answers1

0

First, you can find all li elements. Then, for each li element get the first direct span child element and then access it's text.

Example:

meters = [li.find("span", recursive=False).get_text() for li in soup.find_all("li", { "class" : "feature__item" }) ]

Since there is no further way to exclude other values using HTML selectors (all are span tags), you might have to filter out values with in them manually to get your final output.

Like this:

result = list(map(int, [i.replace('m²', '').strip() for i in meters if 'm²' in i]))

Outputs:

[351, 573 ...]

Reference:

How to find children of nodes using BeautifulSoup

Rithin Chalumuri
  • 1,739
  • 7
  • 19
  • Hi Rithin, many thanks for helping. I tried you code above but it still returning the same: '\n 351 m²\n ', '\n 4 \n ', '\n 3 \n ', '\n 5 \n ', '\n 573 m²\n ', '\n 4 \n ', '\n 4 \n ', '\n 4 \n ', '\n 270 m²\n ', '\n 3 \n ', '\n 3 \n ', '\n 3 \n ETc... I was expecting: 351, 573, ... Thanks! – Ferby Nov 05 '19 at 20:55
  • @Ferby, could you add the html input as text in the question so I can try? And your expected output? – Rithin Chalumuri Nov 05 '19 at 21:05
  • Hi, my expected output is: [351, 573, 270, 350,...] just the numbers! It is bringing me more than the necessary . the site is: https://www.zapimoveis.com.br/aluguel/casas-de-condominio/sp+sao-paulo/?__zt=nrp%3Ab I tried to copy but I don´t know how to copy all the lines, sorry! lol its a web scraping project. – Ferby Nov 05 '19 at 21:31
  • @Ferby, please check the updated answer. You would have to manually filter text with 'm²' out and get integers back. – Rithin Chalumuri Nov 05 '19 at 21:38
  • Rithin, thank you so much bro!! I was really stuck in this, you helped me a lot!! – Ferby Nov 06 '19 at 12:58