0

I'm writing a scraper to open a CSV, get a list of links, extract a specific HTML tag in the site (speechs) and save the content in a TXT file, named after the day the speech was given.

Here is was I accomplished:

#encoding:utf-8
import csv
import urllib
import lxml.html
import unicodedata

objeto = csv.reader(open('links.csv', 'rU'), dialect=csv.excel_tab)

for link in objeto:
    connection = urllib.urlopen(link[0])
    dom = lxml.html.fromstring(connection.read())
    discurso = []
    for d in dom.xpath('//div[@id="content-core"]/div/p/text()'):
        discurso.append(d)
    d1 = " ".join(discurso)
    data = dom.xpath('//span[@class="documentPublished"]/text()[normalize-space()]')
    data1 = [date.strip() for date in data]
    make_string = "-".join(data1)
    file = open(make_string+'.txt', 'w')
    file= arquivo.write(d1)
    file.close()

I was able to extract the date and the speech, but the final step is not working. When trying to save the speech a in TXT file, the IDLE shows me the message

IOError: [Errno 2] No such file or directory: '17/12/2010 23h39,.txt'

I've tried using 'w' and 'a' when creating the file, but it failed. What am I doing wrong?

  • This has been asked countless times. Check the related column to the right of your question and you'll find [this](https://stackoverflow.com/questions/29493444/ioerror-errno-2-no-such-file-or-directory-python?rq=1), [this](https://stackoverflow.com/questions/18067799/ioerror-errno-2-no-such-file-or-directory?rq=1), [this](https://stackoverflow.com/questions/19819099/virtualenv-cant-create-virtualenv-ioerror-errno-2-no-such-file-or-director?rq=1), etc. Usual issues: relative vs. absolute paths, missing quotes, missing escape characters, forward/backslashes, spaces in file names. – jDo Apr 10 '16 at 20:01
  • @jDo no, none of those questions is related to OP's problem, nor are absolute paths. Note that the code is writing to a file, so it need not exist beforehand. The problem is the slashes indicating directories. – Alex Hall Apr 10 '16 at 20:08

1 Answers1

2

The problem is that it expects to find a directory 17 and a subdirectory 12 under that because / is used to denote directories. I suggest replacing all / characters with -.

Alex Hall
  • 34,833
  • 5
  • 57
  • 89