0

I've got 2 scenarios that I need to handle differently when scraping code. 2 similar classes both contain prices of buildings and need to be added to excel in a chronological order because they have to match other data I'm scraping.

The properties I'm scraping data off have 2 different classes. One looks like this:

<div class="xl-price rangePrice">
                                375.000 €  
                            </div>

The other looks like this:

<div class="xl-price-promotion rangePrice">
                                <span>from </span> 250.000 € <br><span>to</span> 695.000 €  
                            </div>

My code is able to extract either one but not both. What I need it to do, is to go through all the prices on a search result page, and append them to a list "pricelist".

I'm doing the same for the square meters, building type, etc. and feed each list item into an excel file.

For this reason it is crucial that they are chronologically added to the list because if they are not, the consquence is that the row position in excel of the price, will not match the row position of square meter and building type.

Does anyone have an idea why my code is not able to extract both classes?

Here's my code and the page I'm trying to extract the prices from:

Getting the website and looping through the first 4 pages:

    for number in range(1, 4):
        listplace = (number - 1) * len(buildinglist1)
        immo_page = requests.get(f'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={number}',
                                 headers=header)
        soup = Beautiful

Soup(immo_page.content, 'lxml')  # html parser

     pricelist = ['Price']


        for item in soup.findAll('div', attrs={'class': 'xl-price'}):
            # item = item.text.strip().split()
            try:
                for item in soup.findAll('div', attrs={'class': 'xl-price-promotion rangePrice'}):
                    temp_list = []
                    item = item.text.strip().split()
                    item.remove('from'), item.remove('€'), item.remove('to'), item.remove('€')
                    for price in item: temp_list.append(price.replace('.', ''))
                    print(temp_list)
                    temp_list = [int(temp_list[0]) + int(temp_list[1])]
                    print(temp_list)
                    for item in temp_list: pricelist.append(item / 2)
            except ValueError:
                for item in soup.findAll('div', attrs={'class': 'xl-price rangePrice'}):
                    item = item.contents[0]
                    item = item.strip()[0:-1]
                    item = item.replace(' ', '')
                    item = item.replace('.', '')
                    pricelist.append(item)
        print(pricelist)

So this is what I tried to get the prices and append them to a list.

The output when you only use one of the two (in this example I show the output of the code that runs in the "Except" value:

['Price', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000', '275000', '298000', '535000', '145000', '159000', '487000', '189000', '325000', '139000', '499000', '520000', '249500', '448000', '215000', '225000', '210000', '215000', '218000', '232000', '689000', '228000', '299500', '135000', '549000', '125000', '169000', '160000', '395000', '430000', '210000']
['Price', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000', '210000', '325000', '375000', '135000', '385000', '285000', '339000', '125000', '225000', '635000', '445000', '689000', '205000', '438000', '595000', '180000', '320000', '48000', '165000', '150000', '119000']
['Price', '235000']

Every "Price" indicates a new page. But as you can see in page 3 it's not complete and only shows the first value that it encounters that's a single price but does not take the double price values.

  • When there is more than 1 price I take the average of that price and then append it to the pricelist.

Much appreciated!

MoofinTheExplorer
  • 141
  • 1
  • 2
  • 9

2 Answers2

5
import requests
from bs4 import BeautifulSoup
import csv

types = []
sqs = []
prices = []
des = []
links = []

for url in range(1, 11):
    print(f"Extracting Page# {url}")
    r = requests.get(
        f"https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={url}")
    soup = BeautifulSoup(r.text, 'html.parser')
    for ty in soup.findAll('div', attrs={'class': 'title-bar-left'}):
        ty = ty.text.strip()
        types.append(ty)
    for sq in soup.select('div[class*="surface-ch"]'):
        sq = sq.text.strip()
        if 'm²' in sq:
            sq = sq[0:sq.find('m')]
        else:
            sq = 'N/A'
        sqs.append(sq)
    for price in soup.select('div[class*="-price"]'):
        price = price.get_text(strip=True)
        if 'from' in price:
            price = price.replace('from', 'From: ')
            price = price.replace('to', ' To: ')
        else:
            price = price[0:price.find('€') + 1]
        prices.append(price)
    for de in soup.select('div[class*="-desc"]'):
        de = de.get_text(strip=True)
        des.append(de)
    for url in soup.findAll('a'):
        url = url.get('href')
        if url is not None and 'for-sale/leuven/3000/id' in url:
            links.append(url)
final = []
for item in zip(types, sqs, prices, des, links):
    final.append(item)
with open('output.csv', 'w+', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(['Type', 'Size', 'Price', 'Desc', 'Link'])
    writer.writerows(final)
    print("Operation Completed")

View Output Online: Click Here

Screenshot:

enter image description here

2

This script grabs the data from page 1 to 10 and saves them as csv file. The price is average (if more than one found for the ad):

import re
import csv
import requests
from bs4 import BeautifulSoup
from statistics import mean

url = 'https://www.immoweb.be/en/search/apartment/for-sale/leuven/3000?page={}'

data = []
for page in range(1, 10):
    soup = BeautifulSoup(requests.get(url.format(page)).text, 'html.parser')

    for result, price, surface, desc, link in zip( soup.select('.title-bar-left'),
                                        soup.select('.rangePrice'),
                                        soup.select('.xl-surface-ch, .l-surface-ch, .m-surface-ch'),
                                        soup.select('.xl-desc, .l-desc, .m-desc'),
                                        soup.select('.result-xl > a[target="IWEB_MAIN"], .result-l > a[target="IWEB_MAIN"], .result-m > a[target="IWEB_MAIN"]') ):
        s = (re.findall('\s*(.*?m²)\s*', surface.get_text(strip=True)) or '-')[0]
        bed = (re.findall('\s*([\s\d\-]+bed.)\s*', surface.get_text(strip=True)) or '-')[0]

        old_price = price.select_one('.old-price')
        if old_price:
            old_price.extract()

        price = mean( [int(''.join(re.findall(r'\d+', v))) for v in re.findall(r'\s*(.*?)\s*€', price.text)] )

        data.append([result.get_text(strip=True),
              price,
              s, bed, desc.get_text(strip=True)])

        print('{:<65} {:<10} {:<20} {:<20} {:<70}'.format(*data[-1]))

        data[-1] += [link['href']]

with open('output.csv', 'w') as f_out:
    writer = csv.writer(f_out, delimiter=',',
                        quotechar='"', quoting=csv.QUOTE_MINIMAL)
    writer.writerows(data)

Prints:

Apartment                                                         275000     70 m²                2 bed.               energiezuinig app, hartje Leuven, 2 slpk, fietsenstalling             
Apartment                                                         298000     84 m²                2 bed.               App. 2 slpk in de unieke residentie Keizershof!                       
Apartment                                                         535000     80 m²                2 bed.               appartement                                                           
Flat/Studio                                                       145000     32 m²                1 bed.               studio                                                                
Flat/Studio                                                       159000     22 m²                1 bed.               studio                                                                
Apartment                                                         487000     149 m²               3 bed.               Modern spatious apartment within the ring of Leuven                   
Flat/Studio                                                       189000     30 m²                1 bed.               flat                                                                  
Apartment                                                         325000     75 m²                2 bed.               appartement                                                           
Flat/Studio                                                       139000     23 m²                1 bed.               studio                                                                
Apartment                                                         499000     104 m²               2 bed.               appartement                                                           
Apartment                                                         249500     95 m²                2 bed.               appartement                                                           

... and so on.

The file in LibreOffice Calc looks like:

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    Lol did you just write the code I spent more than a week trying to figure out in a couple of minutes? It's a punch in the gut and an amazing learning experience now that I can see how you did what I've been trying to do for a while now! Thank you very much! I will have a look at it asap and if you don't mind ask you about some of the things you've done :) – MoofinTheExplorer Dec 06 '19 at 11:22