3

I am a new learner of python and scrapy, I copy these codes from a video, they worked well in the video but when I have a try, there is a TypeError of 'float' object is not iterable, here are the codes

import scrapy

class StackOverflowSpider(scrapy.Spider):
name="stackoverflow"
start_urls=["http://stackoverflow.com/questions?sort=votes"]

def parse(self,response):
    for href in response.css('.question-summary h3 a::attr(href)'):
        full_url=response.urljoin(href.extract())
        yield scrapy.Request(full_url,callback=self.parse_question)

def parse_question(self,response):
    yield {
        'title':response.css('h1 a::text').extract()[0],
        'votes':response.css(".question.vote-count-post::text").extract()[0],
        'body':response.css(".question.post-text").extract()[0],
        'tags':response.css(".question.post-tag::text").extract(),
        'link':response.url,
    }

then here is the Error:

2017-03-10 16:06:39 [scrapy] INFO: Enabled item pipelines:[]
2017-03-10 16:06:39 [scrapy] INFO: Spider opened
2017-03-10 16:06:39 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-03-10 16:06:39 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-03-10 16:06:40 [scrapy] ERROR: Error downloading <GET http://stackoverflow.com/questions?sort=votes>
Traceback (most recent call last):
  File "C:\Anaconda2\lib\site-packages\twisted\internet\defer.py", line 1299, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "C:\Anaconda2\lib\site-packages\twisted\python\failure.py", line 393, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request
    defer.returnValue((yield download_func(request=request,spider=spider)))
  File "C:\Anaconda2\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred
    result = f(*args, **kw)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request
    return handler.download_request(request, spider)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 60, in download_request
    return agent.download_request(request)
  File "C:\Anaconda2\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 285, in download_request
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1631, in request
    parsedURI.originForm)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1408, in _requestWithEndpoint
    d = self._pool.getConnection(key, endpoint)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1294, in getConnection
    return self._newConnection(key, endpoint)
  File "C:\Anaconda2\lib\site-packages\twisted\web\client.py", line 1306, in _newConnection
    return endpoint.connect(factory)
  File "C:\Anaconda2\lib\site-packages\twisted\internet\endpoints.py", line 788, in connect
    EndpointReceiver, self._hostText, portNumber=self._port
  File "C:\Anaconda2\lib\site-packages\twisted\internet\_resolver.py", line 174, in resolveHostName
    onAddress = self._simpleResolver.getHostByName(hostName)
  File "C:\Anaconda2\lib\site-packages\scrapy\resolver.py", line 21, in getHostByName
    d = super(CachingThreadedResolver, self).getHostByName(name, timeout)
  File "C:\Anaconda2\lib\site-packages\twisted\internet\base.py", line 276, in getHostByName
    timeoutDelay = sum(timeout)
TypeError: 'float' object is not iterable
2017-03-10 16:06:40 [scrapy] INFO: Closing spider (finished)
2017-03-10 16:06:40 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 1,
 'downloader/exception_type_count/exceptions.TypeError': 1,
 'downloader/request_bytes': 235,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2017, 3, 10, 8, 6, 40, 117000),
 'log_count/DEBUG': 1,
 'log_count/ERROR': 1,
 'log_count/INFO': 7,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2017, 3, 10, 8, 6, 39, 797000)}
2017-03-10 16:06:40 [scrapy] INFO: Spider closed (finished)

thanks for your help!

VoidBug
  • 65
  • 1
  • 6

2 Answers2

0

You code works in python3, but the items is empty list, I delete the index and run it again:

2017-03-10 16:48:34 [scrapy.core.scraper] DEBUG: Scraped from <200 http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits>
{'link': 'http://stackoverflow.com/questions/179123/how-to-modify-existing-unpushed-commits', 'title': ['How to modify existing, unpushed commits?'], 'votes': [], 'body': [], 'tags': []}
宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • thanks, but I don't understand you well, do you mean my codes just work in python3? But scrapy just support Python2.7 in windows, and in the video the teacher also use windows, could you tell me what should I do in details? How can I change the codes? – VoidBug Mar 10 '17 at 09:04
  • @King.Lee your code stopped in the first request, but in my environment, it works fine. The only problem is `response.css(".question.post-text").extract()[0]` you are indexing a empty list. when i delete the index, it return an empty list like I posted. – 宏杰李 Mar 10 '17 at 09:07
  • I am sorry to tell that even you explained like that, my problem still not solved, I even change my OS from windows to Ubuntu and use python3.6, while I use Anaconda to setup scrapy, but the code still have the same problem, I am very confused with this situation, is there any wrong with my environment? – VoidBug Mar 12 '17 at 02:33
  • 1
    thanks a lot for your help! I have solved this problem, it is caused by the Anaconda, I think that's unreasonable cause Anaconda is published by the python officially, but I tried to use 'scrapy bench' to test if there any question in my environment, the typeERROR still there, so I knew the Anaconda is not so believable, after I configured my environment little by little, the typeERROR problem is solved, but just as you said,the index of 'response.css(".question.vote-count-post::text").extract()[0],response.css(".question.post-text").extract()[0],'are out of range, but I don't know why – VoidBug Mar 12 '17 at 09:12
  • @King.Lee use virtualenv to get an isolated environment. – 宏杰李 Mar 12 '17 at 10:27
0

I know it's an old question. But I found a different solution in my case: Maybe you should try conda install scrapy instead of pip install scrapy.

That was the dependencies installed after running the command:

The following NEW packages will be INSTALLED:

    attrs:            15.2.0-py27_0
    automat:          0.5.0-py27_0
    constantly:       15.1.0-py27_0
    cssselect:        1.0.1-py27_0
    hyperlink:        17.1.1-py27_0
    incremental:      16.10.1-py27_0
    parsel:           1.2.0-py27_0
    pyasn1:           0.2.3-py27_0
    pyasn1-modules:   0.0.8-py27_0
    pydispatcher:     2.0.5-py27_0
    queuelib:         1.4.2-py27_0
    scrapy:           1.3.3-py27_0
    service_identity: 17.0.0-py27_0
    twisted:          17.5.0-py27_0
    w3lib:            1.17.0-py27_0
    zope:             1.0-py27_0
    zope.interface:   4.4.2-py27_0