0

I have seen similar issues with trying to scrape the html of these links: https://agsjournals.onlinelibrary.wiley.com/toc/15325415/2021/69/7 and https://www.just-eat.fr/en/delivery/italian-style-pizza/paris-13e-arrondissement-centre with the issue being that the html that I request is not the URL of the webpage. This error is a part of the html that gets pulled, this specific snippet being from the first link:

  <div class="cf-alert cf-alert-error cf-cookie-error" data-translate="enable_cook
  ies" id="cookie-alert">Please enable cookies.</div>

Where do I configure to enable cookies so that I can scrape the data for the site? I am currently using bs4 version 4.9.3 and requests version 2.25.1. Any help is much appreciated.

Austin
  • 159
  • 2
  • 9

2 Answers2

1

You need to use headers in which you can apply user-agent to access data from website.

You can learn more about headers from this Stack Overflow thread

import requests
from bs4 import BeautifulSoup
headers={"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36"}
res=requests.get("https://agsjournals.onlinelibrary.wiley.com/toc/15325415/2021/69/7",headers=headers)
soup=BeautifulSoup(res.text,"html.parser")

2nd website you are looking for where data is render using Javascript so bs4 will not able to find any data but you can try from XHR using Network tab

Bhavya Parikh
  • 3,304
  • 2
  • 9
  • 19
  • 1
    You're a legend! Thank you, your solution worked great. I will check out your idea for the second link. – Austin Sep 02 '21 at 01:00
1

For 2nd website try to use selenium to get data from pages that render using Javascript, Review this Stack Overflow thread to learn more about that

Mhd O.
  • 120
  • 1
  • 8
  • Please add further details to expand on your answer, such as working code or documentation citations. – Community Sep 01 '21 at 07:14
  • I will look into this. The thread made a good distinction between parsers and web retrieval tools. – Austin Sep 02 '21 at 01:01