1

enter image description here

I'm working with scrapy and dataset (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data) which is a layer on top of sqlalchemy , trying to load data into a sqllite table as a follow up to Sqlalchemy : Dynamically create table from Scrapy item.

using the dataset package I have:

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):

        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        db = dataset.connect(db_path)
        self.table = db[table_name].table


    def process_item(self, item, spider):

        try:
            print('TEST DATASET..')
            self.table.insert(dict(name='John Doe', age=46, country='China'))
            print('INSERTED')
        except IntegrityError:
                print('THIS IS A DUP')
        return item

after running my spider I see the print statements printed out in the try except block, with no errors, but after completion , I look in the table and see the screenshot. No data is in the table. What am I doing wrong?

Dave
  • 7,555
  • 8
  • 46
  • 88
user1592380
  • 34,265
  • 92
  • 284
  • 515

2 Answers2

2

The code you posted is not working as is for me:

TypeError: __init__() takes exactly 2 arguments (1 given)

That's because the __init__ method expects a table_name argument which is not being passed. You need to implement the from_crawler class method in the pipeline object, something like:

@classmethod
def from_crawler(cls, crawler):
    return cls(table_name=crawler.spider.name)

That would create a pipeline object using the spider name as table name, you can of course use any name you want.

Also, the line self.table = db[table_name].table should be replaced by self.table = db[table_name] (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data)

After that, the data is stored: enter image description here

elacuesta
  • 891
  • 5
  • 20
  • Thank you, I removed the @classmethod from my code, to simplify it. sorry. Thank you. BTW what db gui are you using? – user1592380 Dec 22 '16 at 16:17
  • Actually, the @classmethod decorator is necessary because the from_crawler method is invoked as a class method :-) As for the GUI, it's http://sqlitebrowser.org – elacuesta Dec 22 '16 at 16:54
  • Thank you. Can I ask 1 more question: is it it possible to set the order of the fields given that the dict passed to the insert command is unordered? – user1592380 Dec 22 '16 at 16:59
1

Maybe some problems with the Db connection. Put your this snippet into a try except to check for the problem.

try:
   db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
   db = dataset.connect(db_path)
   self.table = db[table_name].table
except Exception:
   traceback.exec_print()
FallAndLearn
  • 4,035
  • 1
  • 18
  • 24
  • Thanks for looking at this. tried it, no errors. Also added print(db.tables) to the code, and the correct tables are printed out in a list as expected, so it appears the connection is OK. – user1592380 Dec 22 '16 at 00:59