div not showing up in html from url using requests library and bs4

Question

I have a simple script where I want to scrape a menu from a url:

https://untappd.com/v/glory-days-grill-of-ellicott-city/3329822

When I inspect the page using dev tools, I identify that the menu contained in the menu section <div class="menu-area" id="section_1026228">

So my script is fairly simple as follows:

import requests
from bs4 import BeautifulSoup

venue_url = 'https://untappd.com/v/glory-days-grill-of-ellicott-city/3329822'

response = requests.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
soup = BeautifulSoup(response.text, 'html.parser')

menu = soup.find('div', {'class': 'menu-area'})
print(menu.text)

I have tried this on a locally saved page of the url and it works. But when I do it to the full url using the requests library, it does not work. It cannot find the div. It throws this error:

print(menu.text)
AttributeError: 'NoneType' object has no attribute 'text'

which basically means it cannot find the div. Does anyone know why this is happening and how to fix it?

I'm not seeing a `menu-area` on this page. Do you need to be logged-in to see it? — Andrej Kesely, Jan 11 '23 at 00:23
@AndrejKesely interesting. It seems so. I just logged out from my browser and it showed me a different page. However, my script has no login part at all. Not even sure how that would work — Tendekai Muchenje, Jan 11 '23 at 00:40
Probably this is relevant: https://stackoverflow.com/questions/11892729/how-to-log-in-to-a-website-using-pythons-requests-module — Andrej Kesely, Jan 11 '23 at 00:43
As @AndrejKesely said, you should log in first by creating a `requests.Session()` and making a POST request. Furthermore, the ID you provided may change over time as I suppose it is associated with your account. Therefore, I recommend to search for a more general identifier or selector. — Parsa Abbasi, Jan 11 '23 at 15:01

score 0 · Answer 1 · answered Jan 13 '23 at 01:37

I just logged out from my browser and it showed me a different page. However, my script has no login part at all. Not even sure how that would work

[It doesn't work with all sites, but it seems to be enough for this site so far.] You can login with request.Session.

# import requests

sess = requests.Session()
headers = {'user-agent': 'Mozilla/5.0'}
data = {'username': 'YOUR_EMAIL/USERNAME', 'password': 'YOUR_PASSWORD'}

loginResp = sess.post('https://untappd.com/login', headers=headers, data=data)
print(loginResp.status_code, loginResp.reason, 'from', loginResp.url) ## should print 200 OK...

response = sess.get(venue_url, headers = {'User-agent': 'Mozilla/5.0'})
## CAN CONTINUE AS BEFORE ##

I've edited my solution to one of your previous questions about this site to include cookies so that the site will treat you as logged in. For example:

# venue_url = 'https://untappd.com/v/glory-days-grill-of-ellicott-city/3329822'
gloryMenu = scrape_untappd_menu(venue_url, cookies=sess.cookies)

will collect the following data:

Note: They have a captcha when logging in so I was worried it would be too hard to automate; if it becomes an issue, you can [probably] still login on your browser before going to the page and then paste the request from your network log to curlconverter to get the cookies as a dictionary. Ofc the process is then no longer fully automated since you'll have to repeat this manual login every time the cookies expire (which could be as fast as a few hours). If you wanted to automate the login at that point, you might have to use some kind of browser automation like with selenium.

div not showing up in html from url using requests library and bs4

1 Answers1