I have a CSV file with the following structure as a example of one line:
01.01.2020;12:00:00;50;100;150
I have a SQLite database file with a table with following columns structure: DateAndTime as TEXT, Field1 as INTEGER, Field2 as INTEGER, Field3 as INTEGER, AddField as INTEGER (THAT field depends from UserInput and is not from CSV)
(I have no option to declare DateAndTime as DATETIME in my Sqlite database.)
First of all, I think I can't do this with a Bulk insert because I have to convert some values in the right object (first and second column of CSV in ONE datetime Object, add the last parameter from UserInput). The other values of CSV are ok, they are simple integers.
Additionally I need to check if a datarow I want to insert is already inserted. When yes ignore this row and continue with next row.
The steps I have done so far:
List<MyObject> MyObjectList = new List<MyObject>();
using (StreamReader file = new StreamReader(@filepath))
{
string ln;
while ((ln = file.ReadLine()) != null)
{
if (!String.IsNullOrWhiteSpace(ln))
{
try
{
string[] ValuesStringArray = ln.Split(';');
string[] DateArray = ValuesStringArray[0].Split('.');
string[] TimeArray = ValuesStringArray[1].Split(':');
DateTime dateTime = new DateTime(Int16.Parse(DateArray[2]),
Int16.Parse(DateArray[1]),
Int16.Parse(DateArray[0]),
Int16.Parse(TimeArray[0]),
Int16.Parse(TimeArray[1]),
Int16.Parse(TimeArray[2]));
int field1 = Int32.Parse(ValuesStringArray[2]);
int field2 = Int32.Parse(ValuesStringArray[3]);
int field3 = Int32.Parse(ValuesStringArray[4]);
int addField = Int32.Parse(USERINPUT blabla);
MyObject myObject = new MyObject() {DateTime = dateTime, Field1 = field1, Field2 = field2, Field3 = field3, AddField = addField };
MyObjectList.Add(myObject);
........
Then I am iterating through the MyObjectList and Insert it to the database with following statements:
INSERT INTO MyTable (DateTime, Field1, Field2, Field3, AddField) SELECT @DateTime, @Field1, @Field2, @Field3, @AddField WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE DateTime = @DateTime AND Field1 = @Field1 AND [..so on]);
The @statements get the parameters and it is in a transaction so for all inserts it's one transaction.
My problem is now, that this procedure will take a long time when the csv file is 100,000 lines long. The time for one insert increases exponential. The first insert takes between 0-1 ms which will slightly increase the more inserts there proceeded. I'm ok with the first part of the code with creating the MyObjectList and fill it, this works really well and fast for me.
I need your help to get the insert transaction as fast as possible. Is there any better way of looking for duplicates and ignore when there is one? Any hints or examples would be helpful.