0

Abstract question: When importing a record (= table row) from some programs to a database, I sometimes want to only import the record if and only if the record isn't already present in the database, based on some unique identifier. Is it preferable to:

  • first query the database to check whether the record with the same identifier is already present in the database, and if not, insert the record,
  • or add some unicity constraint in the database, then try to insert the record, surrounded by some try ... catch

?


Concrete example: I have a data file containing a list of users. For each user I have a first name and last name. I assume that two people may not have the same name. I want to write a Python script that inserts the data in a PostgreSQL table, whose columns are userid (auto-incremented primary key), first_name (text), and last_name (text).

I want to insert the user John Doe. Is it preferable to:

  • first query the database to check whether the record with the same record is already present in the database, and if not, insert the record, e.g.:

    sql = "SELECT COUNT(*) cnt FROM users WHERE first_name = 'John' AND last_name`= 'Doe'"
    data = pd.read_sql_query(sql, connection)
    if data['cnt'][0]==0:
        sql = "INSERT INTO users (first_name, last_name) values ('John','Doe')"
        cursor.execute(sql)
        connection.commit()
    
  • or add some unicity constraint in the database (on (first_name, last_name)), then try to insert the record, surrounded by some try … catch, e.g."

    try:
        sql = "INSERT INTO users (first_name, last_name) values ('John','Doe')"
        cursor.execute(sql)
        connection.commit()
    except:
        print('The user is already present in database')
    

?

Community
  • 1
  • 1
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

0 Answers0