2

I have this locally defined python function that works fine when inserting data into a redshift table:

def _insert_data(table_name, values_list):
    insert_base_sql = f"INSERT INTO {table_name} VALUES"
    insert_sql = insert_base_sql + ','.join([str(row) for row in values_list])
    <run_local_python_code_to_execute_insert_sql>

The values_list is a list of tuples, each with the same number of elements as columns in table_name (even though I don't explicitly assert/check that in this function). However, I'm failing to find a way to insert a NULL value for a smallint column. Here's the schema for the table in question (no DEFAULT values were assigned to columns at table creation):

 schemaname |      tablename      |    column    |         type          | encoding | distkey | sortkey | notnull
------------+---------------------+--------------+-----------------------+----------+---------+---------+---------
 public     | table               | col1         | bigint                | lzo      | t       |       1 | f
 public     | table               | col2         | date                  | lzo      | f       |       2 | f
 public     | table               | col3         | smallint              | lzo      | f       |       3 | f
 public     | table               | col4         | smallint              | lzo      | f       |       4 | f
 public     | table               | col5         | double precision      | none     | f       |       0 | f
 public     | table               | col6         | bigint                | lzo      | f       |       0 | f
 public     | table               | col7         | character varying(48) | bytedict | f       |       0 | f

I'm specifically trying to insert NULL values for col3 and col4; I've tried creating tuples with '' and 'NULL' but run into this error: Error: invalid input syntax for integer: "NULL".

For what it's worth, here's what a scrubbed row ends up looking like in the INSERT statement: ('bigint_value', 'dt_value', 'NULL', 'NULL', 'double_value', 'bigint_value', 'string_name')

Scott Borden
  • 177
  • 1
  • 2
  • 14
  • Have you tried using the value of `None` in the place where you want the `NULL`s to be? – scwagner Nov 27 '17 at 23:11
  • Yup - that was my first thought too; resulted in this error: `Error: column "none" does not exist in table_name` (rows looked like `('bigint_value', 'dt_value', None, None, 'double_value', 'bigint_value', 'string_name')`) – Scott Borden Nov 27 '17 at 23:30
  • To insert null use the keyword NULL with no quotes. – Scott Marlowe Nov 28 '17 at 00:37
  • Yup - @alecxe's answer below helped. Moving away from `','.join([str(row) for row in values_list])` & using placeholders instead worked. – Scott Borden Nov 29 '17 at 18:32

1 Answers1

1

The approach you are taking is quite dangerous by itself. Constructing queries using string concatenation and formatting is error-prone and not secure - you are making your query vulnerable to SQL injection attacks.

Instead, properly parameterize your query, passing a list of parameters as a separate argument to the cursor.executemany(). Here is one, not quite pretty, way to generate placeholders:

placeholders = ", ".join(["%s"] * len(values_list))
query = f"""
    INSERT INTO 
        {table_name} 
    VALUES
        ({placeholders})
"""
cursor.executemany(query, values_list)

(note that table name cannot be parameterized - sanitize and validate it separately)

Note the use of executemany() - it would execute a prepared query statement for every tuple in the values_list.

But, if you are using psycopg2, there is an even better way to insert multiple records into the table - execute_values() - take a look at this answer.

And, coming back to your initial question - if you take this approach, None placeholder values would be automatically converted to 'NULL' strings by the database driver.

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Noted re: sql injection concerns; I'll correct for that. I initially tried using `None` as well, but ran into this error: `Error: column "none" does not exist in table_name` – Scott Borden Nov 27 '17 at 23:33
  • @ScottBorden right, I suspect you have not used parameterized queries, None was inserted into the query as `None` "keyword" with no quotes around it - hence it was interpreted as a column name and not a column value. Please give this approach a try. – alecxe Nov 27 '17 at 23:36