0

Trying to scrape some data. Checking it with print and getting multiple prints..However, the CSV has only one entry. Can you help please? Thanks a lot.

import csv
import time
import requests
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium import webdriver


job_Details = []
job_links = []



chrome_options = Options()
'''chrome_options.add_argument("--headless")'''
driver = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe', options=chrome_options)
driver.get(f'https://remotejobs.world/')
'''SCROLL_PAUSE_TIME = 20'''
last_height = driver.execute_script("return document.body.scrollHeight")
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
divs = driver.find_elements_by_tag_name('h2')
for div in divs:
    job_Details.append(div)
    link = div.find_element_by_tag_name('a')
    job_links.append(link)

for job_detail, job_link in zip(job_Details, job_links):
    if job_detail and job_link:
        print(job_link.get_attribute('href'))
        print(job_detail.text)
        url = job_link.get_attribute('href')
        new_page = requests.get(url).text
        time.sleep(2)
        soup = BeautifulSoup(new_page, 'html.parser')
        job_desc = soup.find('div', class_='w-full md:w-2/3')
        if job_desc:
            print(job_desc.text) #Successful Prints.
            dict = {'Job_title and Company': job_detail.text, "Job link": job_link.get_attribute('href'),
                        "Job Details": job_desc.text}
            with open('remoteWORLD.csv', 'w') as f:
                w = csv.DictWriter(f, dict.keys())
                w.writeheader()
                w.writerow(dict)
Abhishek Rai
  • 2,159
  • 3
  • 18
  • 38
  • Because you have `with open('remoteWORLD.csv', 'w') as f:` within your for loop so you'll only get the last entry in your CSV file. –  Dec 02 '20 at 14:10
  • @JustinEzequiel When I move it outside it says Dict might be undefined? – Abhishek Rai Dec 02 '20 at 14:12
  • You only need `dict` for the header row and the keys are fixed! And I suggest you rename `dict` as it shadows the builtin. –  Dec 02 '20 at 14:25
  • @JustinEzequiel Can you please check..it is writing the first entry in the third column.. – Abhishek Rai Dec 02 '20 at 15:05
  • Look at your `dict = {...}` line. `'Job_title and Company': job_detail.text` and `"Job Details": job_desc.text`. If that's wrong then you need to switch the values. –  Dec 02 '20 at 15:16
  • BTW, that's from your original code. –  Dec 02 '20 at 15:17
  • @JustinEzequiel No, that's not wrong..I am saying ..it is not printing the job_desc.text at all.. – Abhishek Rai Dec 02 '20 at 15:26
  • Do you mean `print(job_desc.text)` does not print? If so, then your condition `if job_desc:` is False meaning your soup.find(...) returned None which requires you to post a new question. –  Dec 02 '20 at 15:30
  • @JustinEzequiel It prints it successfully, just not in the csv/dict. You can run it and check it yourself, if possible for you. – Abhishek Rai Dec 02 '20 at 15:49
  • @JustinEzequiel It is kind of truncating the print..I don't know why..It's a comparatively larger amount of text..could that be the reason? – Abhishek Rai Dec 02 '20 at 16:05
  • Open the CSV in a text editor. My guess is the "text" contains a whole bunch of whitespace and what you are using to view (Excel?) the CSV cannot display the data. –  Dec 02 '20 at 16:09
  • @JustinEzequiel Damn!..that makes sense...Let me check. – Abhishek Rai Dec 02 '20 at 16:13
  • @JustinEzequiel You are right..I couldn't have have thought of it myself though..would have remained stuck..Thanks a ton. – Abhishek Rai Dec 02 '20 at 16:14
  • When in doubt, always open a text editor. I recommend Notepad++. –  Dec 02 '20 at 16:15
  • @JustinEzequiel :) ..Checked it in NP++ ..I like it very much. – Abhishek Rai Dec 02 '20 at 16:23

2 Answers2

1

See below how simple my suggestion was.

with open('remoteWORLD.csv', 'w') as f:
    w = csv.DictWriter(f, ['Job_title and Company', "Job link", "Job Details"])
    w.writeheader()
    for job_detail, job_link in zip(job_Details, job_links):
        if job_detail and job_link:
            print(job_link.get_attribute('href'))
            print(job_detail.text)
            url = job_link.get_attribute('href')
            new_page = requests.get(url).text
            time.sleep(2)
            soup = BeautifulSoup(new_page, 'html.parser')
            job_desc = soup.find('div', class_='w-full md:w-2/3')
            if job_desc:
                print(job_desc.text) #Successful Prints.
                dict = {'Job_title and Company': job_detail.text, "Job link": job_link.get_attribute('href'),
                            "Job Details": job_desc.text}
                w.writerow(dict)

0

Try using:

with open('remoteWORLD.csv', 'a+') as f:

The w will rewrite the file each time. The a means you can append (the + means it can be read as well as written).

Check here for more explanations: Difference between modes a, a+, w, w+, and r+ in built-in open function?

EDIT: or as Justin says, move it outside of the loop and write your lists job_Details and job_links to it

ob9528
  • 11
  • 3