0

Requirement:

  1. Please consider a spring batch application.
  2. Input is a file containing a column of values.
  3. The Spring Batch is designed as chunk oriented design.
  4. Chunk is designed to take 1000 records at a time
  5. Therefore, Reader reads 1000 records from a file in microseconds.
  6. Processor takes one record at a time and triggers the SQL query:

select * from TABLE where COLUMN2 = "record"

There may be only one record or multiple records retrieved and those records go through some business logic.

  1. In writer, we accumulate all the records passed by the business logic (number of records will be less than 1000) and inserts into the database.

Problem here: Consider the table has almost 400K records stored.

While reading 1000 records from the file, it takes few microseconds.

While processing the 1000 records (that means, hitting the above SQL query for 1000 times in the database), it takes 4 minutes to process.

While writing into the database (insertion of for example 100 selected records), it takes few microseconds.

While analyzing, I found that there is only the Primary Key column indexed in the table. The column that we are using (column2) is not included as an indexed column.

Please advise, whether adding a column as an index is a better solution to this.

Pratim Singha
  • 569
  • 2
  • 10
  • 33
  • Can the column2 be UNIQUE? If not creating index to the column2 would best suit here as you're performing SELECT more number of times. However, can we change the processor to hit the DB only once or at least lesser times? Can you provide more details on why it should hit DB for every record? – Govind Jun 28 '20 at 15:58
  • COLUMN2 is NOT unique. Please consider that we cannot increase the size of the processor as of now. On the question on why its hitting the DB for every record; as its in requirement that we need to check whether the value in the file does exists in the database or not. If it exists, it would retrieve the record(s) and modify few more column values according to business needs and then insert into the Database table as a new entry. The values in the files which does not exist in the database will be ignored. – Pratim Singha Jun 28 '20 at 16:25
  • Is that possible to hold all the values in an arrayList and hit DB only once or lesser times based on the size and do your business logic afterwards? – Govind Jun 28 '20 at 16:29
  • I am not sure about the efficiency of hitting an SQL query as mentioned in the description with 1000 values in the where clause – Pratim Singha Jun 28 '20 at 16:34
  • Yeah., by creating an index we could improve the performance to some extent. Also, single DB hit or less than 10 DB hits are far better than 1000 DB hits. – Govind Jun 28 '20 at 16:38

1 Answers1

2

select * from TABLE where COLUMN2 = "record"

Please advise, whether adding a column as an index is a better solution to this.

Yes, adding an index to the column(s) used in your where clause should improve performance, in your case, it is COLUMN2.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50