web scraping w/age verification

Question

Hello I want to web scrape data from a site with an age verification pop-up using python 3.x and beautifulsoup. I can't get to the underlying text and images without clicking "yes" for "are you over 21". Thanks for any support.

EDIT: Thanks, with some help from a comment I see that I can use the cookies but am not sure how to manage/store/call cookies with the requests package.

So with some help from another user I am using selenium package so that it will work also in case it's a graphical overlay (I think?). Having trouble getting it to work with the gecko driver but will keep trying! Thanks for all the advice again, everyone.

EDIT 3: OK I have made progress and I can get the browser window to open, using the gecko driver!~ Unfortunately it doesn't like that link specification so I'm posting again. The link to click "yes" on the age verification is buried on that page as something called mlink...

EDIT 4: Made some progress, updated code is below. I managed to find the element in the XML code, now I just need to manage to click the link.

#
import time
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'/Users/jeff/Documents/geckodriver') # Optional argument, if not specified will search path.
driver.get('https://www.shopharborside.com/oakland/#/shop/412');

url = 'https://www.shopharborside.com/oakland/#/shop/412'
driver.get(url)

#
driver.find_element_by_class_name('hhc_modal-body').click(Yes)

#wait.1.second
time.sleep(1)

pagesource = driver.page_source
soup = BeautifulSoup(pagesource)

#you.can.now.enjoy.soup
print(soup.prettify())

Edit new: Stuck again, here is the current code. I seem to have isolated the element "mBtnYes" but I get an error when running the code : ElementClickInterceptedException: Message: Element is not clickable at point (625,278.5500030517578) because another element obscures it

 import time
 import selenium
 from selenium import webdriver
 from selenium.webdriver.common.keys import Keys
 from selenium.webdriver.support.ui import WebDriverWait
 from bs4 import BeautifulSoup

 driver = webdriver.Firefox(executable_path=r'/Users/jeff/Documents/geckodriver') # Optional argument, if not specified will search path.
 driver.get('https://www.shopharborside.com/oakland/#/shop/412');

 url = 'https://www.shopharborside.com/oakland/#/shop/412'
 driver.get(url)

 #

 driver.find_element_by_id('myBtnYes').click()

 #wait.1.second
 time.sleep(1)

 pagesource = driver.page_source
 soup = BeautifulSoup(pagesource)

 #you.can.now.enjoy.soup
 print(soup.prettify())

So... You would need to click that button using Python, then. Find the form responsible for that, specify the parameters and send it! — ForceBru, Jan 28 '18 at 15:58
You can use this cookie `document.cookie = "ageConfirmation=true;";` **ageConfirmation=true** in your request since the website is checking it in order to show age confirmation check — zamir, Jan 28 '18 at 16:02

score 1 · Answer 1 · answered Jan 28 '18 at 20:18

if your aim is to click the verification get to selenium: ps install selenium && get geckodriver(firefox) or chromedriver(chrome)

#Mossein~King(hi i'm here to help)
import time
import selenium
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.firefox.options import Options
from BeautifulSoup import BeautifulSoup

#this.is.for.headless.This.will.save.you.a.bunch.of.research.time(Trust.me)
options = Options()
options.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=options)

#for.graphical(you.need.gecko.driver.for.firefox)
# driver = webdriver.Firefox()

url = 'your-url'
driver.get(url)

#get.the.link.to.clicking
#exaple if<a class='MosseinKing'>
driver.find_element_by_xpath("//a[@class='MosseinKing']").click()

#wait.1.secong.in.case.of.transitions
time.sleep(1)

pagesource = driver.page_source
soup = BeautifulSoup(pagesource)

#you.can.now.enjoy.soup
print soup.prettify()

managed to install selenium. Firefox is fine, but having trouble with the gecko driver stuff. I seem to have the driver but not sure how to call it in the code... running mac os sierra. what about chrome? any other tips? — thesmeagol, Jan 29 '18 at 10:51
Just add the geckodriver in your [**PATH**](https://stackoverflow.com/questions/40388503/how-to-put-geckodriver-into-path). Check the link. — Keyur Potdar, Jan 29 '18 at 16:49
Hi thanks, I made progress, and found the element, now I need to get it to click. see the above code which has been edited — thesmeagol, Jan 30 '18 at 13:48

web scraping w/age verification

1 Answers1