I am trying to scrape house prices data using Selenium and BeautiuflSoup. Here is the code I am using:
driver_path = r"C:\Users\berid\python\webdriver\msedgedriver.exe"
service = Service(driver_path)
driver = webdriver.Edge(service=service)
articles_list=[]
for city_url in city_urls:
# defining maximum number of pages for each city_url
driver.get(f'https://www.housing_website.com/city/{city_url}/page-1')
time.sleep(5)
main_html=driver.page_source
main_soup=BeautifulSoup(main_html,'html.parser')
max_pages_tag=main_soup.select_one('div[class="descriptionAndModeContainer"] div[class="homes summary"]')
max_pages=int(int(max_pages_tag.text.split('of')[-1].split('home')[0].strip())/40) if max_pages_tag else None
for page in range(1,500):
try:
page_url = f'https://www.housing_website.com/city/{city_url}/page-{page}'
driver.get(page_url)
time.sleep(5)
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
articles=soup.select('div[class="map collapsedList"] div[class="HomeCardContainer defaultSplitMapListView"] div[class="bottomV2"]')
for i,article in enumerate(articles):
articles_list.append(article.text)
print(city_url.split('/')[-1],f'{page} out of {max_pages}')
except:
break
if page%10==0:
pickle.dump(articles_list,open(f'csv_files/home_prices/{city_url.split("/")[-1]}_{page-10}-{page-1}.pickle','wb'))
articles_list=[]
if max_pages is not None and page > max_pages:
break
elif max_pages is None and page==101:
break
driver.quit()
In jupyter notebook, it freezes on a certain page and in CMD terminal I get this error:
[1500:11068:0715/140721.394:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:11068:0715/140723.018:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:11068:0715/140723.160:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:11068:0715/140723.343:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:11068:0715/140723.604:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:11068:0715/140723.822:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:11068:0715/140724.910:ERROR:fallback_task_provider.cc(124)] Every renderer should have at least one task provided by a primary task provider. If a "Renderer" fallback task is shown, it is a bug. If you have repro steps, please file a new bug and tag it as a dependency of crbug.com/739782.
[1500:13988:0715/140729.803:ERROR:cert_issuer_source_aia.cc(34)] Error parsing cert retrieved from AIA (as DER):
ERROR: Couldn't read tbsCertificate as SEQUENCE
ERROR: Failed parsing Certificate
How can I modify the code and avoid the error?