0

[DISCLAIMER] I have been through plenty of the other answers on the area, but they do not seem to work for me.

I want to be able to export the data I have scraped as a CSV file.

My question is how do I write the piece of code which outputs the data to a CSV?

Current Code

import requests
from bs4 import BeautifulSoup 

url = "http://implementconsultinggroup.com/career/#/6257"
r = requests.get(url)

req = requests.get(url).text
soup = BeautifulSoup(r.content)
links = soup.find_all("a")

for link in links:
     if "career" in link.get("href") and 'COPENHAGEN' in link.text:
             print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

Output from the code

View Position

</a>
<a href='/career/management-consultants-to-help-our-customers-succeed-with-
it/'>
Management consultants to help our customers succeed with IT
COPENHAGEN • At Implement Consulting Group, we wish to make a difference in 
the consulting industry, because we believe that the ability to create Change 
with Impact is a precondition for success in an increasingly global and 
turbulent world.




View Position

</a>
<a href='/career/management-consultants-within-process-improvement/'>
Management consultants within process improvement
COPENHAGEN • We are looking for consultants with profound
experience in Six Sigma, Lean and operational
management

Code I have tried

with open('ImplementTest1.csv',"w") as csv_file:
     writer = csv.writer(csv_file)
     writer.writerow(["link.get", "link.text"])
     csv_file.close()

Output in CSV format

Column 1: Url Links

Column 2: Job description

E.g

Column 1: /career/management-consultants-to-help-our-customers-succeed-with- it/

Column 2: Management consultants to help our customers succeed with IT COPENHAGEN • At Implement Consulting Group, we wish to make a difference in the consulting industry, because we believe that the ability to create Change with Impact is a precondition for success in an increasingly global and turbulent world.

Palle Broe
  • 99
  • 2
  • 9
  • You have to store your results in a list. – t.m.adam Sep 03 '17 at 18:15
  • Thanks Adam. I'm quite new to Python, are you able to quickly show how to create/store the results as a list? – Palle Broe Sep 03 '17 at 18:16
  • Here is my answer to a similar question: [extract-data-from-html-to-csv-using-beautifulsoup](https://stackoverflow.com/questions/45675705/extract-data-from-html-to-csv-using-beautifulsoup/45676970#45676970) – t.m.adam Sep 03 '17 at 18:24
  • So I just have to add in this piece? tables = soup.find_all('table') data = [] for table in tables: previous = table.find_previous_siblings('h2') id = previous[0].get('id') if previous else None rows = [td.get_text(strip=True) for td in table.find_all('td')] data.append([id] + rows) – Palle Broe Sep 03 '17 at 22:47
  • Or which parts of the code you wrote is relevant in my case? – Palle Broe Sep 03 '17 at 22:48
  • The parts that collect the data and write it to csv. Just use Shahin's answer. – t.m.adam Sep 04 '17 at 00:24
  • Sir t.m.adam, please take a look into the link. Sometimes you provide some answer on some complicated stuff which is out of the box and hard to find anywhere, as in .tail.strip() in css selector. https://stackoverflow.com/questions/46028354/unable-to-get-the-full-content-using-selector – SIM Sep 04 '17 at 07:59

1 Answers1

2

Try this script and get the csv output:

import csv ; import requests
from bs4 import BeautifulSoup 

outfile = open('career.csv','w', newline='')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for link in links:
     if "career" in link.get("href") and 'COPENHAGEN' in link.text:
        item_link = link.get("href").strip()
        item_text = link.text.replace("View Position","").strip()
        writer.writerow([item_link, item_text])
        print(item_link, item_text)
outfile.close()
SIM
  • 21,997
  • 5
  • 37
  • 109
  • Thanks Shahin - this works exactly as I wanted it. The only feature not working it the last piece: outfile.close(): File "", line 7 outfile.close() ^ SyntaxError: invalid syntax – Palle Broe Sep 04 '17 at 22:31
  • This is because I suppose, you use python 2 whereas I use python 3. I'm not sure, though! However, it runs flawless in my end. – SIM Sep 04 '17 at 22:38