2

I have a program that works with postgres using psycopg2. But insertion in DB takes too long.

Here are the results of profiling using cProfile.

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
475    0.022    0.000  855.605    1.801 /home/user/my_project/db.py:197(insert_event)
475    0.012    0.000  293.347    0.618 /home/user/my_project/db.py:123(insert_meta)
475    0.026    0.000  276.814    0.583 /home/user/my_project/db.py:102(insert_image)
2375 0.022 0.000 598.542 0.252 /usr/local/lib/python2.7/dist-packages/psycopg2/extras.py:286(execute)
1425  251.676    0.177  251.676    0.177 {method 'commit' of 'psycopg2._psycopg.connection' objects}
475    0.005    0.000   33.028    0.070 /home/user/my_project/db.py:83(is_event)

Conclusions:

Insert full information about one event - 1.8 sec
Insert a picture (average) - 0.583 sec
Insert meta data about an event (average) - 0.618 sec
Confirmation of transaction (average) - 0.177 sec
Check availability of a record in DB - 0.070 sec

Here is the code that works with psycopg2.

class PostgresDb(object):
    def __init__(self, host, port, name, user, password, connect=True):
        self.host = host
        self.port = port
        self.name = name
        self.user = user
        self.password = password
        self.connection = None
        self.cursor = None
        if connect:
            self.connect()

    def __str__(self):
        return ' '.join(map(str, [self.host,
                                  self.port,
                                  self.name,
                                  self.user]))

    def connect(self):
        try:
            self.connection = psycopg2.connect(host=self.host,
                                               port=self.port,
                                               user=self.user,
                                               password=self.password,
                                               database=self.name)
            self.cursor = self.connection.cursor(cursor_factory=psycopg2.extras.NamedTupleCursor)

        except psycopg2.DatabaseError as e:
            print e
            return e.pgerror

    def execute(self, query,  commit=True, repeat=True):
        if self.connection is None:
            self.connect()
        try:
            self.cursor.execute(query)
            if commit:
                self.connection.commit()
        except psycopg2.Error as e:
            print e
            self.connection = None
            return repeat and self.execute(query, commit=commit, repeat=False)
        else:
            return True

What am I doing wrong?

Maybe you have some ideas why it takes so long.

Viach Kakovskyi
  • 1,487
  • 1
  • 14
  • 21
  • See http://stackoverflow.com/q/12206600/398670 . Use `COPY`, batches, `synchronous_commit = off`... lots of different options. – Craig Ringer Jan 23 '15 at 20:54

2 Answers2

3

Based on the results from your profiling, it looks like you are committing thousands of transactions and incurring the associated overhead for each of those commits (some more in-depth discussion of that here and here).

If your requirements do not necessarily stipulate such granular transaction boundaries, one option you may have is to batch multiple inserts together into a single transaction and then execute commit() a single time for that batch. For what you've got posted here, that may equate to something like this silly approximation:

db = PostgresDb(...your connection stuff here...)
#
# Other stuff happens, including determining your INSERT queries
#
for query in my_insert_queries[:-1]:
  db.execute(query, commit=False)
db.execute(my_insert_queries[-1], commit=True)

I'm sure there's a million ways to dice this onion, depending on what the rest of your code looks like - the core of the suggestion is to reduce the number of committed transactions.

rchang
  • 5,150
  • 1
  • 15
  • 25
0

Psycopg2 was slow for me too. Low CPU usage in Python, low CPU usage in Postgres, and queries took a long time. I don't know if it was my database connection or what, but it was intolerable.

Try this: When you want to make a bunch of queries, make sure they end with semicolons, and just add them together in a string. Don't actually run them individually. Run that huge string as a query at the end. This won't work if you need to SELECT things from the database halfway through your transactions and use the results in your Python script, but that's probably a rare case anyway. This sped up my transactions immensely.

sudo
  • 5,604
  • 5
  • 40
  • 78
  • Downvote was premature imho. Maybe provide the explanation too? sounds a little funky if you havent read this: https://www.psycopg.org/psycopg3/docs/basic/from_pg2.html#multiple-statements-in-the-same-query ;) – David Jun 20 '23 at 21:58