1

Currently I'm attempting to scrape Metacritic for new video game releases, gathering the titles of games and their respective scores. The problem I am facing involves each score on the website being assigned multiple classes in the HTML. Each score has been assigned 4 different classes, and I only wish to specify 3.

Example: <div class="metascore_w large game positive">80</div>

Elements containing metascore_w, large, and game are what I wish to collect. In particular, game is essential because without this class, it returns unhelpful miscellaneous scores such as movies, tv shows, and music.

The class positive cannot be used because it only specifies positive reviews, when I also want to collect mixed and negative reviews as well (which have their class name as such.) Though I would prefer to not have to specify positive, mixed, and negative for simplicity's sake, if it must be done I will gladly do so.


The specific issue I am facing is a head-scratcher. If I specify the starting class, it outputs just fine:

scores = soup.find_all('div', {'class': 'metascore_w'}) print(scores)

[<div class="metascore_w medium game positive">90</div>, <div class="metascore_w medium movie positive">80</div] (etc)

If I specify all 4 classes, it outputs just fine as well:

scores = soup.find_all('div', {'class': 'metascore_w large game positive'}) print(scores)

<div class="metascore_w large game positive">80</div>, <div class="metascore_w large game positive">84</div> (etc)

But when I specify 3 classes, I receive no output:

scores = soup.find_all('div', {'class': 'metascore_w large game'}) print(scores)

[]

If anyone has any idea how I could solve this problem, I would greatly appreciate it! Thank you for reading!

Senuvox
  • 111
  • 9
  • Use CSS selectors, e.g., `soup.select('div.metascore_w.large.game')`. The classes won't need to be in a particular order, i.e, ``soup.select('div.large.game.metascore_w')`` should work too. –  May 18 '21 at 20:31
  • Can you share URL? – Andrej Kesely May 18 '21 at 20:32
  • https://www.metacritic.com/browse/games/release-date/new-releases/all/date – Senuvox May 18 '21 at 20:33
  • @JustinEzequiel This definitely seems like a step in the right direction, but another problem I didn't foresee has come up! The inclusion of user made scores, with the class user, are being delivered alongside the intended official scores. Is there any way to exclude them?
    6.1
    71
    – Senuvox May 18 '21 at 20:37
  • Here's a [post](https://stackoverflow.com/a/38033910/5386938) discussing `soup.select(...)`. –  May 18 '21 at 20:39
  • Try `soup.select(':not(.user).metascore_w.large.game')`. –  May 18 '21 at 20:44

1 Answers1

1

To get titles and metascores from the site, you can use this example:

import requests
from bs4 import BeautifulSoup

url = (
    "https://www.metacritic.com/browse/games/release-date/new-releases/all/date"
)

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")

for h3 in soup.select(".product_groups_wrapper h3"):
    title = h3.text
    metascore = h3.find_next(class_="metascore_w").text
    print("{:<50} {:<4}".format(title, metascore))

Prints:

Subnautica: Below Zero                             80  
Assassin's Creed Valhalla: Wrath of the Druids     71  
Resident Evil Village                              84  
Hood: Outlaws & Legends                            64  
Wreckfest                                          78  
Shin Megami Tensei III: Nocturne HD Remaster       78  
Mass Effect Legendary Edition                      89  
Resident Evil Village                              81  
Skate City                                         64  
The Colonists                                      73  
Subnautica: Below Zero                             84  
Assassin's Creed Valhalla: Wrath of the Druids     73  
Resident Evil Village                              85  
Hood: Outlaws & Legends                            61  
Before I Forget                                    88  
Mass Effect Legendary Edition                      90  
Dull Grey                                          55  
Protocol                                           49  
NieR Replicant ver.1.22474487139...                83  
Smelter                                            82  
Shin Megami Tensei III: Nocturne HD Remaster       82  
Subnautica: Below Zero                             78  
Famicom Detective Club: The Missing Heir           74  
Famicom Detective Club: The Girl Who Stands Behind 74  
Skate City                                         67  
Shin Megami Tensei III: Nocturne HD Remaster       77  
Siege Survival: Gloria Victis                      65  
Days Gone                                          78  
Subnautica: Below Zero                             82  
Mass Effect Legendary Edition                      85  
World of Demons                                    80  
Fantasian                                          79  
CLAP HANZ GOLF                                     80  
The Oregon Trail                                   76  
Crash Bandicoot: On the Run!                       59  
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91