0

I am attempting to retrieve news headlines (as well as more down the line but I'm starting with headlines) from the cards available on the URL https://valorantesports.com/news.

from bs4 import BeautifulSoup
import requests
import lxml
import html5lib

page = requests.get("https://valorantesports.com/news/")
soup = BeautifulSoup(page.content, 'html.parser') # I have also tried 'lxml' and 'html5lib', same issue
print(soup)
esports_cards = soup.find('div', class_="a3308")
print(esports_cards)

With the output:

<!DOCTYPE html>
<html><head><title>VALORANT Esports</title><meta charset="utf-8"/><meta content="ie=edge" http-equiv="x-ua-compatible"/><meta content="The best place to watch VALORANT Esports!" name="description"/><meta content="width=device-width,initial-scale=1,shrink-to-fit=no" name="viewport"/><meta content="yes" name="apple-mobile-web-app-capable"/><meta content="black" name="apple-mobile-web-app-status-bar-style"/><meta content="VALORANT Esports" name="apple-mobile-web-app-title"/><meta content="VALORANT Esports" name="application-name"/><meta content="#000000" name="theme-color"/><meta content="yes" name="mobile-web-app-capable"/><meta content="#000000" name="msapplication-navbutton-color"/><meta content="/schedule" name="msapplication-starturl"/><link as="font" crossorigin="" href="https://assets.valorantesports.com/fonts/DINNextLTPro-Regular.woff2" rel="preload" type="font/woff2"/><link as="font" crossorigin="" href="https://assets.valorantesports.com/fonts/DINNextLTPro-Medium.woff2" rel="preload" type="font/woff2"/><link as="font" crossorigin="" href="https://assets.valorantesports.com/fonts/Tungsten-Bold.woff2" rel="preload" type="font/woff2"/><link as="script" href="https://valorantesports.com/location.js?sport=val" rel="preload"/><link href="https://am-a.akamaihd.net/image?resize=180:180&amp;f=https://static.lolesports.com/val/vct-logo.png" rel="apple-touch-icon" sizes="180x180" type="image/png"><link href="https://am-a.akamaihd.net/image?resize=48:48&amp;f=https://static.lolesports.com/val/vct-logo.png" rel="icon" sizes="48x48" type="image/png"><link href="/manifest.json" rel="manifest"><script defer="defer" src="https://valorantesports.com/location.js?sport=val"></script><script defer="defer" src="/vendor.83b586bf56aeeeb2d2ef.js"></script><script defer="defer" src="/main.639eea0dd31ade50ebd7.js"></script><link href="/main.9ba6848184c8a62405bf.css" rel="stylesheet"/></link></link></link></head><body bgcolor="#0f1519"><script>/* Display outdated message for old browsers. Use old JS (ES3) only */
      if(!window.Promise) {
        var scriptTag = document.createElement('script')
        scriptTag.setAttribute('type', 'text/javascript')
        scriptTag.setAttribute('src', '/outdated-browser.js')
        document.head.appendChild(scriptTag)
      }</script></body></html>
None

However I can clearly see with element inspect on the webpage that the divider and class I am referencing that contains the cards exists.

URL Webpage Inspect Element

When I manually look through the print(soup), it appears that none of the information I can see while inspecting the webpage is included in the retrieved HTML. I've tried referencing multiple tutorials on this topic (as well as some stack overflow pages suggesting to try different parsers), but all start with this same process and have no trouble finding their target within the retrieved soup for the sample webpages they use.

Could someone please give any information as to why my parsed page content does not match the actual webpage, and what I could do to correctly retrieve the information and continue? Thank you and I apologize if there is any debugging info/context missing.

  • 1
    The news page itself is just very basic HTML with some javascript, no actual content. The javascript then loads the actual news. – luk2302 Dec 30 '21 at 17:02
  • Thank you for pointing this out. I am not familiar with scraping javascript, but this at least gives me a much better direction so I'll go from there. – Luke Lawrence Dec 30 '21 at 17:03

0 Answers0