Scrapy: NameError: name 'url' is not defined

Question

when I set start_urls inside a Scrapy spider class, the fllowing code is OK:

class InfoSpider(scrapy.Spider):
    name = 'info'
    allowed_domains = ['isbn.szmesoft.com']
    isbns = list(set(pd.read_csv('E:/books.csv')['ISBN']))
    url = 'http://isbn.szmesoft.com/isbn/query?isbn='
    start_urls = [url + isbns[0]]

But then I got the error Scrapy: NameError: name 'url' is not defined when I rewrite my code as follows:

class InfoSpider(scrapy.Spider):
    name = 'info'
    allowed_domains = ['isbn.szmesoft.com']
    isbns = list(set(pd.read_csv('E:/books.csv')['ISBN']))
    url = 'http://isbn.szmesoft.com/isbn/query?isbn='
    start_urls = [url + isbn for isbn in isbns[:3]]

Maybe I can solve this problem in other ways，but I want to know the reason for the ERROR

score 2 · Answer 1 · answered Aug 10 '18 at 05:33

There are only four ranges in Python: LEGB, because the local scope of the class definition and the local extent of the list derivation are not nested functions, so they do not form the Enclosing scope.

Therefore, they are two separate local scopes that cannot be accessed from each other.

U13-Forward · Answer 2 · 2018-08-10T04:36:48.750

0

Try doing __init__:

class InfoSpider(scrapy.Spider):
    def __init__(self):
        self.name = 'info'
        self.allowed_domains = ['isbn.szmesoft.com']
        self.isbns = list(set(pd.read_csv('E:/books.csv')['ISBN']))
        self.url = 'http://isbn.szmesoft.com/isbn/query?isbn='
        self.start_urls = [url + isbn for isbn in isbns[:3]]

Then when you call it do self. before it

edited Aug 10 '18 at 04:36

answered Aug 10 '18 at 03:14

U13-Forward

69,221
14
89
114

Is it a bug of scrapy ? cause my two pieces of code above are so similar – MJ_0826 Aug 10 '18 at 03:30
1

@MJ_0826 It's not a bug of scrapy, scrapy is not causing this, it's class – U13-Forward Aug 10 '18 at 03:31
@MJ_0826 What do you mean? – U13-Forward Aug 10 '18 at 04:29
the first code 'url' is defined but the second is not, the only difference is the last line – MJ_0826 Aug 10 '18 at 04:35
@MJ_0826 Edited my answer – U13-Forward Aug 10 '18 at 04:36
@MJ_0826 The error is NameError so that's why need an `__init__` – U13-Forward Aug 10 '18 at 04:40
@MJ_0826 See: https://stackoverflow.com/questions/625083/python-init-and-self-what-do-they-do – U13-Forward Aug 10 '18 at 04:40
@U9-Forward I think last line won't be coming in __init__ and should be in separate function – Upasana Mittal Aug 10 '18 at 04:44
@UpasanaMittal Yeah that's another option, but in the OP's example he wanted it all as a class variables – U13-Forward Aug 10 '18 at 04:45
I know the reason, it's because of 'for', you can see my answer below – MJ_0826 Aug 10 '18 at 05:30

score 0 · Answer 3 · answered Aug 10 '18 at 03:27

0

You need to pass string of it and try printing url so that you can also go and check it on browser if ut actually exists or not.

start_urls = [url + str(isbn) for isbn in isbns[:3]]
print(start_urls)

answered Aug 10 '18 at 03:27

Upasana Mittal

2,480
1
14
19

@MJ_0826 But the only change you have done is in `line 6` – Upasana Mittal Aug 10 '18 at 04:08
that's why I am very confused – MJ_0826 Aug 10 '18 at 04:22
@MJ_0826 Can you post stacktrace? I want to see the line is facing error. – Upasana Mittal Aug 10 '18 at 04:30
I know the reason, it's because of 'for', you can see my answer below – MJ_0826 Aug 10 '18 at 05:31

Scrapy: NameError: name 'url' is not defined

3 Answers3