: Unsupported URL scheme '': no handler available for that scheme in Scrapy

Question

I am getting this error in scrapy framework. This is my dmoz.py under spiders directory:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector

from dirbot.items import Website


class DmozSpider(BaseSpider):
    name = "dmoz"
    allowed_domains = ["dmoz.org"]
    f = open("links.csv")
    start_urls = [url.strip() for url in f.readlines()]
    f.close()
    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//ul/li')
        items = []

        for site in sites:
            item = Website()
            item['name'] = site.select('a/text()').extract()
            item['url'] = site.select('a/@href').extract()
            item['description'] = site.select('text()').extract()
            items.append(item)

        return items

I am getting this error while running this code:

<GET %22http://www.astate.edu/%22>: Unsupported URL scheme '': no handler available for that scheme in Scrapy

Here's my content of links.csv:

http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/
http://www.atsu.edu/

There are 80 URLs in links.csv. How can I resolve this error?

As a side note you should move the CSV file reading outside the attribute declaration section, perhaps in a tiny static or class method (I'm not familiar with scrapy). Also have a look at the answers for [this question](http://stackoverflow.com/questions/9322219/how-to-generate-the-start-urls-dynamiclly-in-crawling), which suggest overriding the `start_requests` method. — Cristian Ciupitu, Nov 08 '12 at 10:26

score 4 · Accepted Answer · 2012-11-08T15:18:51.650

4

%22 is " urlencoded. Your CSV file probably has lines like this:

"http://example.com/"

Use the csv module to read the file, OR
strip the "s.

Edit: As requested:

'"http://example.com/"'.strip('"')

Edit 2:

import csv
from StringIO import StringIO

c = '"foo"\n"bar"\n"baz"\n'      # Since csv.reader needs a file-like-object,
reader = csv.reader(StringIO(c)) # wrap c into a StringIO.
for line in reader:
    print line[0]

LAST Edit:

import csv

with open("links.csv") as f:
    r = csv.reader(f)
    start_urls = [l[0] for l in r]

edited Nov 08 '12 at 15:18

answered Nov 08 '12 at 09:45

Thank you for your answer. But links in links.csv file are not quoted. How can I strip "s? I would like to try that. Please edit your answer. Thanks – Joyfulgrind Nov 08 '12 at 10:07
[url.strip('"') for url in f.readlines()] doesn't work. Getting this error: – Joyfulgrind Nov 08 '12 at 10:45
Please use the `csv` module. – Nov 08 '12 at 10:48
How? Do this using links.csv. I don't want to print line. – Joyfulgrind Nov 08 '12 at 11:19
What do I assign in c variable? – Joyfulgrind Nov 08 '12 at 15:02
See the comments in my code. And *please* read the documentation on the [csv module](http://docs.python.org/2/library/csv.html?highlight=csv#csv). – Nov 08 '12 at 15:04
Print line[0] simply prints the file name, not the content inside it. – Joyfulgrind Nov 08 '12 at 15:05
Your answer is inappropriate. Please edit your answer to open links.csv file. Thanks – Joyfulgrind Nov 08 '12 at 15:09
@user1755133 I have provided you with more than enough hints to solve your problem. I will not write your software for you. Please try and learn for yourself. If you have a concrete problem, please come back and open a new question. – Nov 08 '12 at 15:11
C'mon answer it for me. Just help me open the file. Please – Joyfulgrind Nov 08 '12 at 15:14

: Unsupported URL scheme '': no handler available for that scheme in Scrapy

1 Answers1