1

I'm trying to extract prices from a website.

The code I've written can do that, but when the website has a price that also shows the old price, it returns "none" instead of a string of the price.

This is an example of the code without the old price (which my code returns as a string)

<div class="xl-price rangePrice">
                            535.000 €  
                        </div>

This is an example of the code WITH the old price (which my code returns as "none")

    < div


class ="xl-price rangePrice" >


487.000 €
< span


class ="old-price" > 497.000 € < br > < / span >

< / div >

The page I'm trying to extract code from: pagelink

My code:

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
    prices.append(items.string)

print(prices)

and another issue I'm having is that it returns the values like this:

'\r\n\t\t\t\t\t\t\t\t298.000 € \r\n\t\t\t\t\t\t\t', '\r\n\t\t\t\t\t\t\t\t145.000 € \r\n\t\t\t\t\t\t\t'

when I only want the numbers.

Would appreciate the help!

MoofinTheExplorer
  • 141
  • 1
  • 2
  • 9
  • 1
    `.string` is not the same thing as `.text`. You can read more about the former [here](https://stackoverflow.com/q/25327693/11301900), what you probably want is the latter. – AMC Dec 01 '19 at 07:53

3 Answers3

0

Here is the sample code for your question.

import re
import requests
page = requests.get("https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000")
print(page.content)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')

prices = []
for items in soup.find_all("div", {"class": "xl-price rangePrice"}):
if items.string:
    result = re.findall(r'\d+.\d+', items.string)
    prices.append(result[0])
else:
    soup1 = BeautifulSoup(str(items), 'html.parser')
    for item in soup1.find("div", {"class": "xl-price rangePrice"}):
        if item.string:
            result = re.findall(r'\d+.\d+', item.string)
            if len(result)>0:
                prices.append(result[0])

print(prices)
Dhruv Rajkotia
  • 370
  • 2
  • 8
0
import requests
from bs4 import BeautifulSoup

r = requests.get(
    'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000')
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
    item = item.contents[0]
    print(item.strip()[0:-1])

Output:

298.000 
145.000 
275.000 
535.000 
487.000 
159.000 
325.000 
189.000 
139.000 
499.000 
520.000 
249.500 
448.000 
215.000 
225.000 
210.000 
215.000 
218.000 
232.000 
689.000 
228.000 
299.500 
169.000 
135.000 
549.000 
125.000 
160.000 
395.000 
430.000 
210.000 
0

I don’t have access to a computer right now, so consider this quasi-pseudocode:

new_price = div_elem.find(text=True, recursive=False)

find_res = div_elem.find('span', attrs={'class': 'old-price'})

if find_res:
    old_price = find_res.get_text(strip=True)

I tried to keep things as easy to understand as possible.

Let me know if you have any questions :)

AMC
  • 2,642
  • 7
  • 13
  • 35