Abstract question: When importing a record (= table row) from some programs to a database, I sometimes want to only import the record if and only if the record isn't already present in the database, based on some unique identifier. Is it preferable to:
- first query the database to check whether the record with the same identifier is already present in the database, and if not, insert the record,
- or add some unicity constraint in the database, then try to insert the record, surrounded by some
try ... catch
?
Concrete example: I have a data file containing a list of users. For each user I have a first name
and last name
. I assume that two people may not have the same name. I want to write a Python script that inserts the data in a PostgreSQL table, whose columns are userid
(auto-incremented primary key), first_name
(text), and last_name
(text).
I want to insert the user John Doe
. Is it preferable to:
first query the database to check whether the record with the same record is already present in the database, and if not, insert the record, e.g.:
sql = "SELECT COUNT(*) cnt FROM users WHERE first_name = 'John' AND last_name`= 'Doe'" data = pd.read_sql_query(sql, connection) if data['cnt'][0]==0: sql = "INSERT INTO users (first_name, last_name) values ('John','Doe')" cursor.execute(sql) connection.commit()
or add some unicity constraint in the database (on
(first_name, last_name)
), then try to insert the record, surrounded by sometry … catch
, e.g."try: sql = "INSERT INTO users (first_name, last_name) values ('John','Doe')" cursor.execute(sql) connection.commit() except: print('The user is already present in database')
?