I am trying to scrape data from a news website using selenium in Jupyter. The scraper works fine and gets all the elements but while writing them to a .csv file it keeps on overwriting the first row.
Below is the code to write the header row of the file.
with open('news_articles.csv', 'w')as file:
file.write("Date; Headline \n")
The following code scrapes the desired elements from the webpages.
for year in range(2011,2013):
driver.get('http://www.sharechat.co.nz/news-archive.html?year='+str(year)) # To open year page
time.sleep(4)
try:
ad= driver.find_element_by_xpath('//div[@class="ns-ek2m2-e-20"]/span')
ad.click()
except:
pass
for i in range(14,16):
driver.execute_script("arguments[0].click();", WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//section[@class='col-md-8 content']/a["+str(i)+"]")))) # To open headline
time.sleep(2)
try:
ad= driver.find_element_by_xpath('//div[@class="ns-ek2m2-e-20"]/span')
ad.click()
except:
pass
headline = driver.find_element_by_xpath("//section[@class='col-md-8 content']/h1")
date = driver.find_element_by_xpath("//p[@class='small']")
#print(date.text)
#print(headline.text)
article = driver.find_elements_by_xpath("//div[@class='contentarticle']/p")
noOfElements = len(article)
for p in range(noOfElements-3):
print(article[p].text)
Now, when I try to write the scraped elements into the same file but in the following rows, the same row keeps on getting over-written.
with open('news_articles.csv', 'w')as file:
file.write(date.text + "; " + headline.text + "\n")