We have a large set of data (bulk data) that needs to be checked if the record is existing in the database.
We are using SQL Server2012/JPA/Hibernate/Spring.
What would be an efficient or recommended way to check if a record exists in the database?
Our entity ProductCodes has the following fields:
private Integer productCodeId // this is the PK
private Integer refCode1 // ref code 1-5 has a unique constraint
private Integer refCode2
private Integer refCode3
private Integer refCode4
private Integer refCode5
... other fields
The service that we are creating will be given a file where each line is a combination of refCode1-5.
The task of the service is to check and report all lines in the file that are already existing in the database.
We are looking at approaching this in two ways.
Approach1: Usual approach.
- Loop through each line and call the DAO to query the refCode1-5 if existing in the db.
//psuedo code for each line in the file call dao. pass the refCode1-5 to query (select * from ProductCodes where refCode1=? and refCode2=? and refCode3=? and refCode4=? and refCode5=?
- given a large list of lines to check, this might be inefficient since we will be invoking the DAO xxxx number of times. If the file say consists of 1000 lines to check, this will be 1000 connections to the DB
Approach2: Query all records in the DB approach
- We will query all records in the DB
- Create a hash map with concatenated refCode1-5 as keys
Loop though each line in the file validating against the hashmap
We think this is more efficient in terms of DB connection since it will not create 1000 connections to the DB. However, if the DB table has for example 5000 records, then hibernate/jpa will create 5000 entities in memory and probably crash the application
We are thinking of going for the first approach since refCode1-5 has a unique constraint and will benefit from the implicit index.
But is there a better way of approaching this problem aside from the first approach?