I am working on a daily automated job which checks the records in a text file against oracle database. We receive the text file everyday from an external team which contains records around 100,000. The text file will be in unix format which has 6 columns seperated by | symbol.
eg, HDR 1
home/sample/file|testvalue1|testvalue2|testval3|testval4|testval5
TRL
I need to check whether the values in testval3 and testval5 exists in my table in oracle database. The table has around 10 million records. I am currently processing it through shellscript. Inside the shellscript, i am reading the text file and iterating through each line in a loop. Inside the loop I am passing the value from each line and running the query against DB. If the records don’t exist in DB, I have to output those to a csv file. The below query is used:
select ‘testval3’,’testval5’ from dual
where not exists (select primarykeycolumn
from mytable where mycolumn=testval3 and mycolumn2=testval5)
Since the input file has 100000 entries, my loop will run the query 100000 times and every time it will check the table with 10 million records. This is making my batch job to run for many hours and I have to terminate it. Is there a better way to handle this situation? I can also use java if there is no better way to do this via shellscript.