1

Trying to figure out how to extract the name of the game through beautifulsoup

I think i having a problem with the HTML aspect of it

here what I have so far:

from requests import get

url = 'https://howlongtobeat.com/game.php?id=38050'

response = get(url)

from bs4 import BeautifulSoup

html_soup = BeautifulSoup(response.text, 'html.parser')

game_length = html_soup.find_all('div', class_='game_times')

length = (game_length[-1].find_all({'li': '    short time_100 shadow_box'})[-1].contents[3].get_text())

print(length)

game_name = html_soup.find_all('div', class_='profile_header_game')

game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text())

print(game)

I'm getting the length but not the game name why?

for print(length) prints:

31 Hours 

but for print(game) prints:

game_name = html_soup.find_all('div', class_='profile_header_game')

game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text()) File "", line 1 game = (game_name[].find({"profile_header shadow_text"})[].contents[].get_text()) ^ SyntaxError: invalid syntax

print(game) Traceback (most recent call last): File "", line 1, in NameError: name 'game' is not defined

what am I doing wrong?

littlejiver
  • 255
  • 2
  • 13

2 Answers2

1

It looks like there are a few syntax issues in your code. Here is a corrected version:

from bs4 import BeautifulSoup
import requests

url = 'https://howlongtobeat.com/game.php?id=38050'
response = requests.get(url)

html_soup = BeautifulSoup(response.text, 'html.parser')
game_times_tag = html_soup.find('div', class_='game_times')

game_time_list = []
for li_tag in game_times_tag.find_all('li'):
    title = li_tag.find('h5').text.strip()
    play_time = li_tag.find('div').text.strip()

    game_time_list.append((title, play_time))

for game_time in game_time_list:
    print(game_time)

profile_header_tag = html_soup.find("div", {"class": "profile_header shadow_text"})
game_name = profile_header_tag.text.strip()
print(game_name)
Pierre
  • 1,068
  • 1
  • 9
  • 13
  • Thanks that helps but I am trying to get the game name as well any tips? looking to make a spreadsheet out of the data as well, how can I extract the data in a "cleaner" format? – littlejiver Jun 09 '18 at 17:51
  • Thanks for the feedback. Indeed, it looks like I initially forgot to add the game name in the code! I updated my answer to fix that. If you want to extract the data to a spreadsheet, you could just write the data to a CSV file, which can be opened easily with a spritesheet editor (Excel, ...). Consider checking out this [great answer](https://stackoverflow.com/a/37290105/7663649) about CSV writers if you want more details. – Pierre Jun 09 '18 at 17:55
  • I'm having a problem with this code (even before editing) can you confirm that this should work? I'm getting an this error **game_time_list.append((title, play_time)) File "", line 1 game_time_list.append((title, play_time)) ^ IndentationError: unexpected indent** Game Name works perfectly – littlejiver Jun 09 '18 at 18:01
  • It looks like this part of the code is not indented properly. Make sure that the whitespace before the line contains exactly 4 spaces. – Pierre Jun 09 '18 at 18:11
  • thanks I'm going to try that! I was wondering to (I'm very new to all this) I am copying and pasting the code into ubuntu command console is there a better way (a program maybe) to compile the code? (this may also be my issue but am not sure) – littlejiver Jun 09 '18 at 18:14
  • Yes, you are right! An easier way is to create a Python file, for example "script.py". You can then paste the code inside the file and run it in your command console using `python script.py`. – Pierre Jun 09 '18 at 18:17
0

shorter version

game_length = html_soup.select('div.game_times li div')[-1].text
game_name = html_soup.select('div.profile_header')[0].text
developer = html_soup.find_all('strong', string='\nDeveloper:\n')[0].next_sibling
bobrobbob
  • 1,251
  • 11
  • 21