I am trying to scraping company review from Glassdoor using BeautifulSoup. However failed to extract anything from this site. I am using the code as follows-
from requests import get
from bs4 import BeautifulSoup
url = "https://www.glassdoor.com/Reviews/The-Wonderful-Company-Reviews-E1005987_P2.htm?
sort.sortType=RD&sort.ascending=false"
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
html_soup
I am observing that the the above codes unable to extract anything and it is showing as- 'Bots not allowed'. I have shared the output below.
<!DOCTYPE html>
<html><head><title></title><style type="text/css">H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}.line {height: 1px; background-color: #525D76; border: none;}</style> </head><body><h1>HTTP Status 403 - Bots not allowed</h1><div class="line"></div><p><b>type</b> Status report</p><p><b>message</b> <u>Bots not allowed</u></p><p><b>description</b> <u>Access to the specified resource has been forbidden.</u></p><hr class="line"/><h3>Apache Tomcat</h3></body></html>
I am new in web scraping domain. Can anybody guide me the way how to extract the reviews from Glass door.