2

Looking to take data I have extracted with beautifulsoup to .csv file

this the code to extract:

from requests import get

url = 'https://howlongtobeat.com/game.php?id=38050'

    response = get(url)

    from bs4 import BeautifulSoup

    html_soup = BeautifulSoup(response.text, 'html.parser')

    game_name = html_soup.select('div.profile_header')[0].text
    game_length = html_soup.select('div.game_times li div')[-1].text
    game_developer = html_soup.find_all('strong', string='\nDeveloper:\n')[0].next_sibling
    game_publisher = html_soup.find_all('strong', string='\nPublisher:\n')[0].next_sibling
    game_console = html_soup.find_all('strong', string='\nPlayable On:\n')[0].next_sibling
    game_genres = html_soup.find_all('strong', string='\nGenres:\n')[0].next_sibling

I would like to write the results of these to csv (it's pulling the info I want but I think it needs to be cleaned up)

not sure how to write to csv or to clean up data

please help

littlejiver
  • 255
  • 2
  • 13
  • Possible duplicate of [Python write to CSV line by line](https://stackoverflow.com/questions/37289951/python-write-to-csv-line-by-line) – Pierre Jun 09 '18 at 21:04

3 Answers3

1

You can use csv.writer:

import csv, re
from bs4 import BeautifulSoup as soup
import requests
flag = False
with open('filename.csv', 'w') as f:
  write = csv.writer(f)
  for i in range(1, 30871):
    s = soup(requests.get(f'https://howlongtobeat.com/game.php?id={i}').text, 'html.parser')
    if not flag: #write header to file once
      write.writerow(['Name', 'Length']+[re.sub('[:\n]+', '', i.find('strong').text) for i in s.find_all('div', {'class':'profile_info'})])
      flag = True
    name = s.find('div', {"class":'profile_header shadow_text'}).text
    length = [[i.find('h5').text, i.find("div").text] for i in s.find_all('li', {'class':'time_100'})]
    stats = [re.sub('\n+[\w\s]+:\n+', '', i.text) for i in s.find_all('div', {'class':'profile_info'})]
    write.writerows([[name, length[0][-1]]+stats[:4]])
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • that works but I'm missing game name? not sure where i would put that in – littlejiver Jun 09 '18 at 21:24
  • perfect!!!! I'm try to understand how it work lol but defintly got the job done would you happen to know how to write this data for all the urls for the site https://howlongtobeat.com/game.php?id= (from 0 to 30870) – littlejiver Jun 09 '18 at 21:36
  • thanks again but once more my newbie brain can't solve the problem I'm getting this error "File "hltb2.py", line 12, in name = s.find('div', {"class":'profile_header shadow_text'}).text AttributeError: 'NoneType' object has no attribute 'text' do you know why? oh and the range is from 1 to 30870 and does that make a difference? – littlejiver Jun 09 '18 at 21:48
  • @littlejiver Yes, that probably does :) Please see my recent edit. – Ajax1234 Jun 09 '18 at 23:21
0

You can use the Python's csv module: https://docs.python.org/3/library/csv.html or https://docs.python.org/2.7/library/csv.html.

Jānis Š.
  • 532
  • 3
  • 14
0

For writing this data to a CSV File,

game_info = [game_name, game_publisher, game_console, game_genre, game_length, game_developer]
with open("game.csv", 'w') as outfile:
    csv.register_dialect('custom', delimiter='\n', quoting=csv.QUOTE_NONE, escapechar='\\')
    writer = csv.writer(outfile,'custom')
    row = game_info
    writer.writerow(row)
yadavankit
  • 353
  • 2
  • 14
  • I'm getting an error when running this "File "hltb.py", line 18, in game_info = [name, publisher, console, genre, length, developer] NameError: name 'name' is not defined" can you help? – littlejiver Jun 09 '18 at 21:21
  • name is actually `game_name` according to your code – yadavankit Jun 09 '18 at 21:25
  • thanks guys (sorry I'm new at this) now its giving me "NameError: name 'csv_filename' is not defined" do I have to make a csv file with that name (i know I can change the name) – littlejiver Jun 09 '18 at 21:33