3

When I'm trying to commit changes into table it takes a lot of time (about 300 seconds per each 1000 rows).

Model class:

class Quotes(base):
    __tablename__ = 'quotes'

    id = Column(INTEGER, primary_key=True, autoincrement=True, index=True)
    ticker = Column(TEXT)
    data = Column(ARRAY(FLOAT))


    def create():
        Session = sessionmaker(db)
        session = Session()
        return session


    def add(session, ticker=None, data=None):
        new_row = Quotes(ticker=ticker, data=data)
        session.add(new_row)

    def commit(session):
        t0 = time.time()
        session.commit()
        print("SQLAlchemy ORM commit(): Total time " + str(time.time() - t0) + " secs")

Code looks like:

        for index in range(len(quotes['timestamp'])):
        temp = [
            int(quotes['timestamp'][index]), quotes['open'][index],
            quotes['high'][index], quotes['low'][index],
            quotes['close'][index]
        ]
        Quotes.add(session, ticker, temp)
        counter += 1

        if counter % 1000 == 0:
            Quotes.commit(session)

I have about 100k rows. At the start I created a session, then I'm adding new row into a session on each iteration of for loop. I call .commit() method each 1000 row, and it takes about 300 seconds. It seems too slow for me. Is it okay or there is another (correct) way to insert such a large data set into database.

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
bigEvilBanana
  • 388
  • 2
  • 8
  • Rather than commit at 1000 rows why not flush the rows at 1000 rows and commit in the end? – mad_ Aug 13 '18 at 13:19
  • It'd require some manual data handling, but you might be interested in COPY. Also is your table index/trigger heavy? 300s for a 1000 rows sounds like an awful lot. – Ilja Everilä Aug 13 '18 at 13:42
  • @mad_ flush() took 287 seconds. – bigEvilBanana Aug 13 '18 at 13:52
  • @IljaEverilä sorry, I'm new to this field. Where I can find best practises or examples how to use Copy? As for the table I have index for primary key (id) – bigEvilBanana Aug 13 '18 at 13:55
  • one correction may be. You might need to remove extra index=True for your primary key. It will have its own index you need not have another index on a primary key. Also, you can remove autoincrement for primary key. It will be shipped by default for single primary key. Still wondering why it is taking 300 secs for just 1000 rows – mad_ Aug 13 '18 at 14:05
  • Nah! Just tried to run script on server and got 1.7 seconds. Just wonder why it work so slow if I launch it from my local machine – bigEvilBanana Aug 13 '18 at 14:25
  • 714ms on this machine to insert a 1000 rows to a table like yours. Are you using a local database, or connecting over the internet? If so, how's your connection? Here you can find the details on COPY: https://www.postgresql.org/docs/current/static/sql-copy.html, and here's how you'd use it with SQLA: https://stackoverflow.com/questions/13125236/sqlalchemy-psycopg2-and-postgresql-copy – Ilja Everilä Aug 13 '18 at 14:28

1 Answers1

0

To increase performance you can use bulk_save_method, it looks the fastest.

You can check performance http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

Rafał
  • 685
  • 6
  • 13