0

This is CS50 Web Project1 stage where I have to import books.csv (containing isbn, title, author and year) with 5000 lines in it. The problem is that the import itself takes too long (about 10 lines per second) and it's not normal I suppose. How do I speed it up?

I have created a table with isbn, title, author and year rows all in varchar. I use postgesql. Next I wrote import.py , which looks like this

import csv
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker


engine = create_engine(os.getenv("DATABASE_URL"))
db = scoped_session(sessionmaker(bind=engine))

def main():
    f = open("books.csv")
    reader = csv.reader(f)
    for ISBN, title, author, year in reader:
        db.execute("INSERT INTO books (ISBN, title, author, year)      VALUES (:ISBN, :title, :author, :year)",
       {"ISBN":ISBN, "title":title, "author":author, "year":year})
    db.commit()

if __name__ == "__main__":
    main()

I expect the import to proceed in less than a minute, but now it's about 30-40 minutes.

  • 1
    Create a transaction and hence insert into the database as a bulk operation. Finally commit the transaction. – jens1o Aug 15 '19 at 12:07
  • @jens1o could you please describe this solution in more details? I'm only beginning my path in handling databases) – Konstantin Golubtsov Aug 15 '19 at 12:29
  • 1
    You should check the [docs](https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations) and this [question](https://stackoverflow.com/questions/3659142/bulk-insert-with-sqlalchemy-orm) – ron_g Aug 15 '19 at 13:51

1 Answers1

0

Given the performance you're observing (10 rows/ second), I would guess the latency of a request to your database is rather high (check it with ping). If that's the case, inserting multiple rows in a single query using INSERT .. VALUES (..), (..), (..) should help a lot.

To do that, you have to:

If your file was larger, I would suggest to look into using "COPY FROM" bulk loads, but there's no point doing that with only 5000 rows.

Nickolay
  • 31,095
  • 13
  • 107
  • 185