How to improve performance of many appends to a list?

Question

How to improve a performance of the following code?

BANNED_DOORBOTS = {...}

async def execute_query(self, query):
    async with self.pool.acquire() as conn:
        async with conn.cursor() as cur:
            await cur.execute(query)
            records = []
            async for row in cur:
                if row[0] not in BANNED_DOORBOTS:
                    records.append({
                        'key1': row[0],
                        'key2': row[1]
                    })

    return records

I don't want to check each time if row[0] not in BANNED_DOORBOTS. How to avoid this?

Usually, I have more than hundred (up to 20 000) elements in records. Maybe I can pre-allocate some space to avoid reallocations?

Martijn Pieters · Accepted Answer · 2017-05-18T10:49:50.110

You are rebuilding a list each time, from a database query.

I'd ask the database to not return records that are banned:

from psycopg2 import sql  # safe SQL composing

# Add a NOT IN clause to filter out banned doorbots, generating a
# separate parameter placeholder per value
query = sql.SQL(query) + sql.SQL(' WHERE ding_id NOT IN ({})').format(
    sql.SQL(', ').join([sql.Placeholder()] * len(BANNED_DOORBOTS)))
await cur.execute(query, BANNED_DOORBOTS)

I used the psycopg.sql framework to do the compositing here, but you could get away with string formatting too (use '%s' for placeholders).

Consider putting the BANNED_DOORBOTS set in a table in the database instead so you can use a WHERE ding_id NOT IN (SELECT id from BANNED_DOORBOTS WHERE id IS NOT NULL) subquery. That way you get better performance still (the database can optimise for this), and you don't have to generate placeholders.

Next, use a list comprehension to build the list. This is faster because it avoids repeated list.append lookups and method calls. Define your column names as a tuple and zip that together with each row:

keys = ('ding_id', 'doorbot_id', 'created_at', 'address', 'latitude', 
        'longitude', 'ding_kind')
return [dict(zip(keys, row)) async for row in cur]

The async for list comprehension syntax requires Python 3.6 or newer.

The aiopg driver lets you configure an alternative cursor factory, one that already produces dictionaries, which may be faster still. You then don't have to use any list comprehension at all:

from psycopg2.extras import RealDictCursor

# configure cursor to yield dictionaries rather than tuples
async with conn.cursor(cursor_factory=RealDictCursor) as cur:
    await cur.execute(query, BANNED_DOORBOTS)
    # directly return the cursor; have the caller do the async iteration
    return cur

If you don't want to make the caller responsible for looping, but have to produce a list, use the cursor.fetchall() method to produce that list; each element will be a dictionary:

# configure cursor to yield dictionaries rather than tuples
async with conn.cursor(cursor_factory=RealDictCursor) as cur:
    await cur.execute(query, BANNED_DOORBOTS)
    return await cur.fetchall()

Thank you very much for the answer. Could you please provide a full code of the algorithm with list comprehension, because I'm new in python and don't really understand what the final version will be (with `return [dict(zip(keys, row)) ...]`) — , May 18 '17 at 10:22
@ipetr: replace everything from `records = []` onwards. So the `return [...]` is inside the `with asyc ...` context managers. — Martijn Pieters, May 18 '17 at 10:24
Sorry, I'm not sure I completely understand how to do this. I will be very grateful if you paste a full code at the end of your answer — , May 18 '17 at 10:34
@ipetr: thing is, I offered several different alternatives; picking just one for a complete example would make things rather messy. Personally, I'd skip the comprehension altogether and go for the `RealDictRow` cursor factory here, and return the `cur` cursor directly, which is shown in the last example. In that example, the `return cur` is located in the same place as the list comprehension would go. Replace that line with the `keys = ...` and `return [...]` lines, same indentation. — Martijn Pieters, May 18 '17 at 10:42
@ipetr: the `async with conn.cursor(...):` context manager sets up a cursor for you; everything you do that needs access to the cursor is nested inside it. Since the `return [... for row in cur]` code uses the cursor, you indent it do be inside the context manager. — Martijn Pieters, May 18 '17 at 10:45
@ipetr: I also added another option: using a dictionary cursor factory and returning a list (no list comprehension needed). — Martijn Pieters, May 18 '17 at 10:50

How to improve performance of many appends to a list?

1 Answers1