-2

I am trying to save html pages with their category using scrapy in python. When trying to save them I would like them to have the name 'WebCategory_http://whatever.com'. Whenever I try to do that with this code:

def parse(self,response):
    content = response.body
    url = response.url
    cat =  str(response.meta['cat'])
    filename = str(cat) + '_' + str(url)
    with open(filename,'wb') as f:
        f.write(response.body)

when I do this, this happens:

IOError: [Errno 2] No such file or directory: 'Arts_https://www.behindthevoiceactors.com/'
2018-11-19 15:43:15 [scrapy.extensions.logstats] INFO: Crawled 45 pages (at 45 pages/min), scraped 0 items (at 0 items/min)
n)

My guess is that '/' is interpreted as part of the path instead of a filename, is there any way keep using '/'?

Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62

2 Answers2

0

No, / is not a valid part of a filename in most filesystems. You need to replace it with a different character.

Tordek
  • 10,628
  • 3
  • 36
  • 67
0

No, you can't use / in a path name, it's a reserved character (on this system).

Replace the character with something else, for instance:

filename = str(cat) + '_' + str(url).replace('/', '_')
Matthieu Brucher
  • 21,634
  • 7
  • 38
  • 62