20

I am new to Python and am getting this error:

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 4, in <module>
    execute()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 130, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 96, in _run_print_help
    func(*a, **kw)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/cmdline.py", line 136, in _run_command
    cmd.run(args, opts)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/commands/crawl.py", line 42, in run
    q = self.crawler.queue
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/command.py", line 31, in crawler
    self._crawler.configure()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/crawler.py", line 36, in configure
    self.spiders = spman_cls.from_settings(self.settings)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/spidermanager.py", line 33, in from_settings
    return cls(settings.getlist('SPIDER_MODULES'))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/spidermanager.py", line 23, in __init__
    for module in walk_modules(name):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scrapy/utils/misc.py", line 65, in walk_modules
    submod = __import__(fullpath, {}, {}, [''])
  File "/my_crawler/empt/empt/spiders/empt_spider.py", line 59
    check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']
    ^
IndentationError: unexpected indent

On this bit of code:

def parse_item(self, response):
    hxs = HtmlXPathSelector(response)
    sites = hxs.select('//a[contains(@href, ".mp3")]/@href').extract()
    items = [ ]

    #for site in sites:
        #link = site.select('a/@href').extract()
        #print site
    for site in sites:
        item = EmptItem()
        item['link'] = site #site.select('a/@href').extract()

        #### DB INSERT ATTEMPT ###
        #MySQL Test

        #open db connection
        db = MySQLdb.connect("localhost","root","str0ng","TESTDB")

        #prepare a cursor object using cursor() method
        cursor = db.cursor()

        #see if any links in the DB match the crawled link
        check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']

        cursor.execute(check_exists_sql)

        if cursor.rowcount = 0:
            #prepare SQL query to insert a record into the db.
            sql = "INSERT INTO LINKS ( link ) VALUES ( '%s')" % item['link']

            try:
                #execute the sql command
                cursor.execute(sql)
                #commit your changes to the db
                db.commit()
            except:
                #rollback on error
                db.rollback()

                #fetch a single row using fetchone() method.
                #data = cursor.fetchone()

                #print "Database version: %s " % data

            #disconnect from server
            db.close()

            ### end mysql

        items.append(item)
    return items​
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
ian
  • 11,605
  • 25
  • 69
  • 96
  • 1
    http://docs.python.org/tutorial/introduction.html: "each line within a basic block must be indented by the same amount". (Wrong use of the term "basic block", incidentally...) – Glenn Maynard Oct 13 '10 at 03:52
  • 3
    How to deal with `IndentationError`: 1) Make sure your lines are indented properly, remembering that Python thinks that tab stops are every 8 columns. 2) Look for a missing colon on the line above, which is usually a `for`, `if`, `else`, `while`, `try`, or similar type of line. In general, if a line ends in a colon, the next line with code needs to be indented by some amount. – Mike DeSimone Oct 13 '10 at 04:03
  • Sorry, I didn't paste this properly into stack overflow, I have fixed it now to match my code. – ian Oct 13 '10 at 04:05

6 Answers6

42

While the indentation errors are obvious in the StackOverflow page, they may not be in your editor. You have a mix of different indentation types here, 1, 4 and 8 spaces. You should always use four spaces for indentation, as per PEP8. You should also avoid mixing tabs and spaces.

I also recommend that you try to run your script using the '-tt' command-line option to determine when you accidentally mix tabs and spaces. Of course any decent editor will be able to highlight tabs versus spaces (such as Vim's 'list' option).

Community
  • 1
  • 1
johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • There's nothing wrong with mixing tabs and spaces (though Python3, in a fit of braindamage, throws an error if you do). However, you should never, ever set the width of a tab character to anything but 8 spaces; that's simply the definition of ^I. – Glenn Maynard Oct 13 '10 at 05:21
  • 1
    @Glenn: The definition of TAB is not 8 spaces, it's "move to the tabstop". You're partly correct in that there's nothing wrong with mixing tabs and spaces, but only if nobody else is ever going to look at your code. I work with plenty of people who use heretical tab widths, and the only way to cope is to use only spaces for indentation. – Matt Curtis Oct 13 '10 at 06:19
  • @Matt: A tabstop in a plain text file is 8 characters; anything else is wrong. That's not to say that indenting *code* by other amounts is wrong, of course, but that's completely independent of what the ^I character means. The correct fix is to tell people to fix their tab stops, which is no harder (easier, in most editors) than disabling tabs. – Glenn Maynard Oct 13 '10 at 07:26
  • 1
    @Glenn: Wrong, sure, but I would not reformat a legacy source tree because of that - that's a good way to annoy a whole development team. Another good way is to tell them all to change their editors (particularly those which use an 8 column default, i.e. they've already made a conscious choice to change it.) – Matt Curtis Oct 14 '10 at 06:02
  • @Matt: Telling them to switch to spaces for indentation requires reformatting a source tree just as much as fixing the tab stops. – Glenn Maynard Oct 14 '10 at 06:21
  • 1
    @Glenn: true, so ITRW I just curse and set my tab width to 4 and get on with editing the code. The pain comes from when some people have used tabs and others spaces in the same source file, then you have to guess whether they've customised their tab width. In C++ this just makes for messy code, but in Python it's a world of hurt. – Matt Curtis Oct 16 '10 at 00:18
  • @Matt: I've never found this to be a problem in Python2 code, because if someone tries to mix tabs and spaces while using nonstandard tab stops, it almost always triggers an indentation error immediately, and if it does happen to compile it'll fail spectacularly when run. (Even better: it's always the person using wrong tabs who has these problems, not me.) I think what bothers me most about Python3's tab rule change is that it's designed to make it *easier* for people to use nonstandard tabstops, and in the process, making things more complicated for people who don't. – Glenn Maynard Oct 16 '10 at 01:02
4

The indentation is wrong, as the error tells you. As you can see, you have indented the code beginning with the indicated line too little to be in the for loop, but too much to be at the same level as the for loop. Python sees the lack of indentation as ending the for loop, then complains you have indented the rest of the code too much. (The def line I'm betting is just an artifact of how Stack Overflow wants you to format your code.)

Edit: Given your correction, I'm betting you have a mixture of tabs and spaces in the source file, such that it looks to the human eye like the code lines up, but Python considers it not to. As others have suggested, using only spaces is the recommended practice (see PEP 8). If you start Python with python -t, you will get warnings if there are mixed tabs and spaces in your code, which should help you pinpoint the issue.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • Both where actually caused by stack overflow, I've made it display correctly now. – ian Oct 13 '10 at 04:06
1

The error is pretty straightforward - the line starting with check_exists_sql isn't indented properly. From the context of your code, I'd indent it and the following lines to match the line before it:

   #open db connection
   db = MySQLdb.connect("localhost","root","str0ng","TESTDB")

   #prepare a cursor object using cursor() method
   cursor = db.cursor()

   #see if any links in the DB match the crawled link
   check_exists_sql = "SELECT * FROM LINKS WHERE link = '%s' LIMIT 1" % item['link']

   cursor.execute(check_exists_sql)

And keep indenting it until the for loop ends (all the way through to and including items.append(item).

Chris Bunch
  • 87,773
  • 37
  • 126
  • 127
0

As the error says you have not correctly indented code, check_exists_sql is not aligned with line above it cursor = db.cursor() .

Also use 4 spaces for indentation.

Read this http://diveintopython.net/getting_to_know_python/indenting_code.html

Mr.Wizard
  • 24,179
  • 5
  • 44
  • 125
Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
0
import urllib.request
import requests
from bs4 import BeautifulSoup

        r = requests.get('https://icons8.com/icons/set/favicon')

If you try to connect to such a site, you will get an indent error.

import urllib.request
import requests
from bs4 import BeautifulSoup


r = requests.get('https://icons8.com/icons/set/favicon')

Python cares about indents

ihsan güç
  • 241
  • 3
  • 7
-1

This error occur when you don't correctly write blocks. Forgetting a ":", or not using "Tab" button for blocks and use spaces. When you are transporting a code from one editor to another editor,it can happen. And never forget this: errors aren't always on that line. I came here for this, but I've forgotten an except after a try. because of my unstandard editor, it happend. But it's possible in normal editor.

TheGreenM
  • 13
  • 4