45

What is the best way to insert a Python dictionary with many keys into a Postgres database without having to enumerate all keys?

I would like to do something like...

song = dict()
song['title'] = 'song 1'
song['artist'] = 'artist 1'
...

cursor.execute('INSERT INTO song_table (song.keys()) VALUES (song)')
tobias
  • 934
  • 8
  • 17
user3783608
  • 857
  • 3
  • 11
  • 20

8 Answers8

51
from psycopg2.extensions import AsIs

song = {
    'title': 'song 1',
    'artist': 'artist 1'
}

columns = song.keys()
values = [song[column] for column in columns]

insert_statement = 'insert into song_table (%s) values %s'

    # cursor.execute(insert_statement, (AsIs(','.join(columns)), tuple(values)))
print cursor.mogrify(insert_statement, (AsIs(','.join(columns)), tuple(values)))

Prints:

insert into song_table (artist,title) values ('artist 1', 'song 1')

Psycopg adapts a tuple to a record and AsIs does what would be done by Python's string substitution.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • Didn't know about `AsIs`. Interesting -- saves needing to deal with the multiplied-out `%s`s... – khampson Apr 07 '15 at 00:33
  • 1
    I was concerned this may leave the user vulnerable to injection attacks due to inadequate escaping, but - at least using a basic example - single apostrophes appear to be properly escaped. I don't have time to test more advanced injection techniques, such as (or similar to) the ones described at the following link, so they may still be a concern vs. more standard parameterization techniques. http://stackoverflow.com/a/12118602/2540707 – evan_b Mar 23 '17 at 01:57
  • 4
    why not just use song.values() for the values? :) – Brian Wylie Apr 04 '18 at 20:52
  • If I want to get the id of that inserted row, then what to do? – Pabitra Roy Jul 09 '20 at 08:25
  • Getting mogrify requires a psycopg2.extensions.cursor but received a 'str' for insert_statement which is clearly a string. – John Barton Aug 24 '20 at 19:14
  • 1
    It looks like this was written in python2, before dictionaries were ordered. Nowadays, you can use .keys() and .values() and order will be maintained. – Justin Furuness Apr 16 '21 at 02:01
35

You can also insert multiple rows using a dictionary. If you had the following:

namedict = ({"first_name":"Joshua", "last_name":"Drake"},
            {"first_name":"Steven", "last_name":"Foo"},
            {"first_name":"David", "last_name":"Bar"})

You could insert all three rows within the dictionary by using:

cur = conn.cursor()
cur.executemany("""INSERT INTO bar(first_name,last_name) VALUES (%(first_name)s, %(last_name)s)""", namedict)

The cur.executemany statement will automatically iterate through the dictionary and execute the INSERT query for each row.

PS: This example is taken from here

vikas
  • 1,116
  • 1
  • 11
  • 12
15

Something along these lines should do it:

song = dict()
song['title'] = 'song 1'
song['artist'] = 'artist 1'

cols=song.keys();

vals = [song[x] for x in cols]
vals_str_list = ["%s"] * len(vals)
vals_str = ", ".join(vals_str_list)

cursor.execute("INSERT INTO song_table ({cols}) VALUES ({vals_str})".format(
               cols = cols, vals_str = vals_str), vals)

The key part is the generated string of %s elements, and using that in format, with the list passed directly to the execute call, so that psycopg2 can interpolate each item in the vals list (thus preventing possible SQL Injection).

Another variation, passing the dict to execute, would be to use these lines instead of vals, vals_str_list and vals_str from above:

vals_str2 = ", ".join(["%({0})s".format(x) for x in cols])

cursor.execute("INSERT INTO song_table ({cols}) VALUES ({vals_str})".format(
               cols = cols, vals_str = vals_str2), song)
khampson
  • 14,700
  • 4
  • 41
  • 43
  • 3
    I would also replace `cols` with `[cursor.mogrify(x) for x in cols]`, and the same for `vals_str`, to thwart SQL injections. – 9000 Apr 05 '15 at 22:31
  • Agreed that that would certainly add extra protection. – khampson Apr 05 '15 at 22:40
  • 2
    Reading the psycopg2 doc a bit, `mogrify` may be unnecessary, as the defintion of that method says that **The string returned is exactly the one that would be sent to the database running the execute() method or similar.**, so I think the columns and `%s` strings will be `mogrify`'ed during the `execute` call. – khampson Apr 05 '15 at 22:43
  • I think the main difference would be whether it did it up front, before calling `execute`, or during the `execute` itself. If done before, it would essentially just return the same string during the `execute` run, since it would've already been mogrified. If you wanted to know exactly what was going over the wire, say, for your logs, it may be beneficial to do ahead of time. – khampson Apr 05 '15 at 22:53
8

The new sql module was created for this purpose and added in psycopg2 version 2.7. According to the documentation:

If you need to generate dynamically an SQL query (for instance choosing dynamically a table name) you can use the facilities provided by the psycopg2.sql module.

Two examples are given in the documentation: https://www.psycopg.org/docs/sql.html

names = ['foo', 'bar', 'baz']

q1 = sql.SQL("insert into table ({}) values ({})").format(
    sql.SQL(', ').join(map(sql.Identifier, names)),
    sql.SQL(', ').join(sql.Placeholder() * len(names)))
print(q1.as_string(conn))

insert into table ("foo", "bar", "baz") values (%s, %s, %s)

q2 = sql.SQL("insert into table ({}) values ({})").format(
    sql.SQL(', ').join(map(sql.Identifier, names)),
    sql.SQL(', ').join(map(sql.Placeholder, names)))
print(q2.as_string(conn))

insert into table ("foo", "bar", "baz") values (%(foo)s, %(bar)s, %(baz)s)

Though string concatenation would produce the same result, it should not be used for this purpose, according to psycopg2 documentation:

Warning: Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.

Community
  • 1
  • 1
Antoine Dusséaux
  • 3,740
  • 3
  • 23
  • 28
1

an other approach for query to mySQL or pgSQL from dictionary is using construction %(dic_key)s, it will be replaced by value from dictionary coresponding by dic_key like {'dic_key': 'dic value'} working perfect, and prevent sqlInjection tested: Python 2.7 see below:

# in_dict = {u'report_range': None, u'report_description': None, 'user_id': 6, u'rtype': None, u'datapool_id': 1, u'report_name': u'test suka 1', u'category_id': 3, u'report_id': None}


cursor.execute('INSERT INTO report_template (report_id, report_name, report_description, report_range, datapool_id, category_id, rtype, user_id) VALUES ' \
                                                                 '(DEFAULT, %(report_name)s, %(report_description)s, %(report_range)s, %(datapool_id)s, %(category_id)s, %(rtype)s, %(user_id)s) ' \
                                                                 'RETURNING "report_id";', in_dict)


OUT: INSERT INTO report_template (report_id, report_name, report_description, report_range, datapool_id, category_id, rtype, user_id) VALUES (DEFAULT, E'test suka 1', NULL, NULL, 1, 3, NULL, 6) RETURNING "report_id";
Victor Orletchi
  • 469
  • 1
  • 5
  • 15
1

Clodaldos answer gets simpler with python3's "ordered dict" promise:

from psycopg2.extensions import AsIs

song = dict(title='song 1', artist='artist 1')

insert_statement = 'insert into song_table (%s) values %s'
cursor.execute(insert_statement, (AsIs(','.join(song.keys())), tuple(song.values())))
hansaplast
  • 11,007
  • 2
  • 61
  • 75
0

Using execute_values https://www.psycopg.org/docs/extras.html is faster and has a fetch argument to return something. Next there is some code that might help. columns is a string like col_name1, col_name2 template is the one that allows the matching, a string like %(col_name1)s, %(col_name2)


def insert(cur: RealDictCursor,
        table_name: str,
        values: list[dict],
        returning: str = ''
        ):
    if not values:
        return []

    query = f"""SELECT
                    column_name AS c
                FROM
                    information_schema.columns
                WHERE
                    table_name = '{table_name}'
                AND column_default IS NULL;"""
    cur.execute(query)
    columns_names = cur.fetchall()

    fetch = False
    if returning:
        returning = f'RETURNING {returning}'
        fetch = True

    columns = ''
    template = ''
    for col in columns_names:
        col_name = col['c']
        for val in values:
            if col_name in val:
                continue
            val[col_name] = None

        columns += f'{col_name}, '
        template += f'%({col_name})s, '
    else:
        columns = columns[:-2]
        template = template[:-2]

    query = f"""INSERT INTO {table_name} 
                    ({columns})
                    VALUES %s {returning}"""
    return execute_values(cur, query, values,
                        template=f'({template})', fetch=fetch)
-3

Python has certain inbuilt features such as join and list using which one can generate the query. Also,the python dictionary offers keys() and values() which can be used to extract column name and column values respectively. This is the approach I used and this should work.

song = dict()
song['title'] = 'song 1'
song['artist'] = 'artist 1'

query = '''insert into song_table (''' +','.join(list(song.keys()))+''') values '''+ str(tuple(song.values()))
cursor.execute(query)
Ragin Sharma
  • 10
  • 1
  • 3
  • 1
    This is a very, very bad idea. To quote the psycopg documentation: "Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint." – Nathan Jan 10 '21 at 15:25