-1

I have a simple script where I want to scrape a menu from a url:

https://untappd.com/v/glory-days-grill-of-ellicott-city/3329822

When I inspect the page using dev tools, I identify that the menu contained in the menu section <div class="menu-area" id="section_1026228">

So my script is fairly simple as follows:

import requests
from bs4 import BeautifulSoup

venue_url = 'https://untappd.com/v/glory-days-grill-of-ellicott-city/3329822'

response = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')

menu = soup.find('div', {'class': 'menu-area'})
print(menu.text)

I have tried this on a locally saved page of the url and it works. But when I do it to the full url using the requests library, it does not work. It cannot find the div. It throws this error:

print(menu.text)
AttributeError: 'NoneType' object has no attribute 'text'

which basically means it cannot find the div. Does anyone know why this is happening and how to fix it?

Tendekai Muchenje
  • 440
  • 1
  • 6
  • 20
  • I'm not seeing a `menu-area` on this page. Do you need to be logged-in to see it? – Andrej Kesely Jan 11 '23 at 00:23
  • @AndrejKesely interesting. It seems so. I just logged out from my browser and it showed me a different page. However, my script has no login part at all. Not even sure how that would work – Tendekai Muchenje Jan 11 '23 at 00:40
  • 1
    Probably this is relevant: https://stackoverflow.com/questions/11892729/how-to-log-in-to-a-website-using-pythons-requests-module – Andrej Kesely Jan 11 '23 at 00:43
  • As @AndrejKesely said, you should log in first by creating a `requests.Session()` and making a POST request. Furthermore, the ID you provided may change over time as I suppose it is associated with your account. Therefore, I recommend to search for a more general identifier or selector. – Parsa Abbasi Jan 11 '23 at 15:01

1 Answers1

0

I just logged out from my browser and it showed me a different page. However, my script has no login part at all. Not even sure how that would work

[It doesn't work with all sites, but it seems to be enough for this site so far.] You can login with request.Session.

# import requests

sess = requests.Session()
headers = {'user-agent': 'Mozilla/5.0'}
data = {'username': 'YOUR_EMAIL/USERNAME', 'password': 'YOUR_PASSWORD'}

loginResp = sess.post('https://untappd.com/login', headers=headers, data=data)
print(loginResp.status_code, loginResp.reason, 'from', loginResp.url) ## should print 200 OK...

response = sess.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
## CAN CONTINUE AS BEFORE ##

I've edited my solution to one of your previous questions about this site to include cookies so that the site will treat you as logged in. For example:

# venue_url = 'https://untappd.com/v/glory-days-grill-of-ellicott-city/3329822'
gloryMenu = scrape_untappd_menu(venue_url, cookies=sess.cookies)

will collect the following data:

op


Note: They have a captcha when logging in so I was worried it would be too hard to automate; if it becomes an issue, you can [probably] still login on your browser before going to the page and then paste the request from your network log to curlconverter to get the cookies as a dictionary. Ofc the process is then no longer fully automated since you'll have to repeat this manual login every time the cookies expire (which could be as fast as a few hours). If you wanted to automate the login at that point, you might have to use some kind of browser automation like with selenium.

Driftr95
  • 4,572
  • 2
  • 9
  • 21