I build a webscraper with BeautifulSoup for the german yellow pages.
Works good so far.
However the problem is that there is a "Load More" Button which shows 10 more results. The URL does NOT change when clicking on it. Since I am a beginner and wrote the code with a YouTube Tutorial I really don't know how get my webscraper to get all results from the page.
Behaviour now:
Running the Webscraper gets me 50 results. After 50 results on the webpage there is a "Load More" button.
Expected Behaviour:
I can define how many times the webscraper "clicks" on the "Load More" button to get more results and safe them inside a list.
Obviously since the URL doesn't change I cannot add a {x} at the end of the link and do a loop. I also read something about Selenium but couldn't figure it out by myself.
import requests
import pandas as pd
from bs4 import BeautifulSoup
main_list = []
def extract(url):
headers = {'User-Agent': 'x'}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
return soup.find_all('article', class_ = 'mod mod-Treffer')
def transform (articles):
for item in articles:
name = item.find('h2', {'data-wipe-name': 'Titel'}).text
tel = item.find('p', class_ = 'mod-AdresseKompakt__phoneNumber').text
try:
website = item.find('a', class_ = 'contains-icon-homepage gc-btn gc-btn--s')['href']
except:
website = ''
business = {
'Name': name,
'Website': website,
'Telefonnummer': tel,
}
main_list.append(business)
return
def load():
df = pd.DataFrame(main_list)
df.to_csv('stb_fürth.csv', index=False)
articles = extract(f'german yellow pages')
transform(articles)
load()
print('CSV Datei erstellt')