How can I get the first string from a div that has a div embedded beautifulsoup4

Question

I'm trying to extract prices from a website.

The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price.

This is an example of the code without the old price (which my code returns as a string)

<div class="xl-price rangePrice">
                            535.000 €  
                        </div>

This is an example of the code WITH the old price (which my code returns as "none")

    < div


class ="xl-price rangePrice" >


487.000 €
< span


class ="old-price" > 497.000 € < br > < / span >

< / div >

The page I'm trying to extract code from: pagelink

My code:

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
    prices.append(items.string)

print(prices)

and another issue I'm having is that it returns the values like this:

'\r\n\t\t\t\t\t\t\t\t298.000 € \r\n\t\t\t\t\t\t\t', '\r\n\t\t\t\t\t\t\t\t145.000 € \r\n\t\t\t\t\t\t\t'

when I only want the numbers.

Would appreciate the help!

`.string` is not the same thing as `.text`. You can read more about the former [here](https://stackoverflow.com/q/25327693/11301900), what you probably want is the latter. — AMC, Dec 01 '19 at 07:53

Dhruv Rajkotia · Answer 1 · 2019-12-01T11:58:01.710

0

Here is the sample code for your question.

import re
import requests
page = requests.get("https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000")
print(page.content)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
if items.string:
    result = re.findall(r'\d+.\d+', items.string)
    prices.append(result[0])
else:
    soup1 = BeautifulSoup(str(items), 'html.parser')
    for item in soup1.find("div", {"class": "xl-price rangePrice"}):
        if item.string:
            result = re.findall(r'\d+.\d+', item.string)
            if len(result)>0:
                prices.append(result[0])

print(prices)

edited Dec 01 '19 at 11:58

answered Dec 01 '19 at 07:32

Dhruv Rajkotia

370
2
8

It looks like the code skips the property that has the "new/old" price. Any idea how to get it to include the new price in the list? – MoofinTheExplorer Dec 01 '19 at 07:49
What’s the point of using RegEx when the program already uses BeautifulSoup? – AMC Dec 01 '19 at 10:30
As per the problem, BeautifulSoup gives the output like "\r\n\t\t\t\t\t\t\t\t298.000 € \r\n\t\t\t\t\t\t\t". So I have used regex to extract the digits from the string. – Dhruv Rajkotia Dec 01 '19 at 11:21
@MoofinTheExplorer I have updated the answer, please check it out. – Dhruv Rajkotia Dec 01 '19 at 11:58

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2019-12-01T08:12:03.267

import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
    item = item.contents[0]
    print(item.strip()[0:-1])

Output:

AMC · Answer 3 · 2019-12-02T05:44:15.500

0

I don’t have access to a computer right now, so consider this quasi-pseudocode:

new_price = div_elem.find(text=True, recursive=False)

find_res = div_elem.find('span', attrs={'class': 'old-price'})

if find_res:
    old_price = find_res.get_text(strip=True)

I tried to keep things as easy to understand as possible.

Let me know if you have any questions :)

edited Dec 02 '19 at 05:44

answered Dec 01 '19 at 08:05

AMC

2,642
7
13
35

@MoofinTheExplorer Alright, fingers crossed! I’ll come back to it tomorrow, and write a more complete solution. It might be useful if you shared more of your code, too. – AMC Dec 01 '19 at 08:07
Hi Alexander! Yes it did, I appreciate it. – MoofinTheExplorer Dec 02 '19 at 05:33

How can I get the first string from a div that has a div embedded beautifulsoup4

3 Answers3

Linked