0

Been a while and i'm getting back into coding for a research project, i'm currently doing a practice site to see what I need to do for the actual site.

I've got everything working how I want, but when the scraped data is outputted to csv it puts the value into a new row instead of the column beside the row it was meant to be on.

I've added links to the output below. Let me know what I need changing as I can't figure it out.

import csv, re
import requests 
from bs4 import BeautifulSoup

URL = "https://realpython.github.io/fake-jobs/"
page = requests.get(URL)
with open('testScraperEX.csv', 'w') as f:
    write = csv.writer(f)
    soup = BeautifulSoup(page.content, "html.parser")
    write.writerow(['Title', 'Company', 'Location'])
    results = soup.find(id="ResultsContainer")
    job_elements = results.find_all("div", class_="card-content")
    for job_element in job_elements:
        title_element = job_element.find("h2", class_="title")
        company_element = job_element.find("h3", class_="company")
        location_element = job_element.find("p", class_="location")
        Title = title_element.text.strip()
        Company = company_element.text.strip()
        Location = location_element.text.strip()
        write.writerows([[Title],[Company],[Location]])

This is the current output 1

This is how I want the output to be 2

Thanks :)

buran
  • 13,682
  • 10
  • 36
  • 61
nm1999
  • 35
  • 6
  • You want `write.writerow([Title, Company, Location])` – buran Feb 13 '22 at 16:09
  • And [check this](https://stackoverflow.com/q/3348460/4046632) how to fix the extra blank lines between each row – buran Feb 13 '22 at 16:11
  • @buran both of those suggestions don't solve my problem. The first one for titling the rows, they're meant to be strings. The second suggestion you offered just shows me how to remove white space which isn't a problem either. – nm1999 Feb 13 '22 at 16:29
  • I don't know how implemented my solution, but it SOLVES the problem. This `write.writerows([[Title],[Company],[Location]])` writes 3 rows, because each element unnecessarily is inside a list (i.e. you have list of 3 one-element lists). This `write.writerow([Title, Company, Location])` writes one line - a list of 3 elements. Note `wrte.writerows` vs `write.writerow` and `[[Title],[Company],[Location]]` vs `[Title, Company, Location]` – buran Feb 13 '22 at 16:32
  • As for the second one - if extra blank line between every data line is fine with you - OK. Most people will consider this a PROBLEM in csv file. – buran Feb 13 '22 at 16:33
  • 'write.writerow([Title,Company,Location])' has solved it for me. I got confused thinking you meant changing the header in the line above. – nm1999 Feb 13 '22 at 16:43
  • I didn't realize you may be confused with the header line – buran Feb 13 '22 at 16:47
  • Please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors) – tripleee Feb 13 '22 at 16:56

1 Answers1

0
import csv
import requests 
from bs4 import BeautifulSoup

URL = "https://realpython.github.io/fake-jobs/"
page = requests.get(URL)
with open('testScraperEX.csv', 'w', newline='') as f:
    write = csv.writer(f)
    soup = BeautifulSoup(page.content, "html.parser")
    write.writerow(['Title', 'Company', 'Location'])
    results = soup.find(id="ResultsContainer")
    job_elements = results.find_all("div", class_="card-content")
    for job_element in job_elements:
        title_element = job_element.find("h2", class_="title")
        company_element = job_element.find("h3", class_="company")
        location_element = job_element.find("p", class_="location")
        Title = title_element.text.strip()
        Company = company_element.text.strip()
        Location = location_element.text.strip()
        write.writerow([Title, Company, Location])

output in the csv file:

Title,Company,Location
Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA"
Energy engineer,Vasquez-Davidson,"Christopherville, AA"
Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA"
Fitness centre manager,Savage-Bradley,"East Seanview, AP"
Product manager,Ramirez Inc,"North Jamieview, AP"
... and many more lines ...

This line in your code

write.writerows([[Title],[Company],[Location]])

writes 3 rows, because each element unnecessarily is inside a list (i.e. you have list of 3 one-element lists). This

write.writerow([Title, Company, Location]) 

writes one line - a list of 3 elements.

Note wrte.writerows vs write.writerow and [[Title],[Company],[Location]] vs [Title, Company, Location]

buran
  • 13,682
  • 10
  • 36
  • 61