0

I have a CSV file with the following structure as a example of one line: 01.01.2020;12:00:00;50;100;150

I have a SQLite database file with a table with following columns structure: DateAndTime as TEXT, Field1 as INTEGER, Field2 as INTEGER, Field3 as INTEGER, AddField as INTEGER (THAT field depends from UserInput and is not from CSV) (I have no option to declare DateAndTime as DATETIME in my Sqlite database.)

First of all, I think I can't do this with a Bulk insert because I have to convert some values in the right object (first and second column of CSV in ONE datetime Object, add the last parameter from UserInput). The other values of CSV are ok, they are simple integers.

Additionally I need to check if a datarow I want to insert is already inserted. When yes ignore this row and continue with next row.

The steps I have done so far:

List<MyObject> MyObjectList = new List<MyObject>();
using (StreamReader file = new StreamReader(@filepath))
    {
        string ln;

        while ((ln = file.ReadLine()) != null)
            {
                if (!String.IsNullOrWhiteSpace(ln))
                    {
                        try
                        {
                            string[] ValuesStringArray = ln.Split(';');

                            string[] DateArray = ValuesStringArray[0].Split('.');
                            string[] TimeArray = ValuesStringArray[1].Split(':');

                            DateTime dateTime = new DateTime(Int16.Parse(DateArray[2]), 
                                                             Int16.Parse(DateArray[1]), 
                                                             Int16.Parse(DateArray[0]),
                                                             Int16.Parse(TimeArray[0]),
                                                             Int16.Parse(TimeArray[1]),
                                                             Int16.Parse(TimeArray[2]));
                            int field1 = Int32.Parse(ValuesStringArray[2]);
                            int field2 = Int32.Parse(ValuesStringArray[3]);
                            int field3 = Int32.Parse(ValuesStringArray[4]);
                            int addField = Int32.Parse(USERINPUT blabla);

                            MyObject myObject = new MyObject() {DateTime = dateTime, Field1 = field1, Field2 = field2, Field3 = field3, AddField = addField };

                            MyObjectList.Add(myObject);
                            ........

Then I am iterating through the MyObjectList and Insert it to the database with following statements:

INSERT INTO MyTable (DateTime, Field1, Field2, Field3, AddField) SELECT @DateTime, @Field1, @Field2, @Field3, @AddField WHERE NOT EXISTS (SELECT 1 FROM MyTable WHERE DateTime = @DateTime AND Field1 = @Field1 AND [..so on]);

The @statements get the parameters and it is in a transaction so for all inserts it's one transaction.

My problem is now, that this procedure will take a long time when the csv file is 100,000 lines long. The time for one insert increases exponential. The first insert takes between 0-1 ms which will slightly increase the more inserts there proceeded. I'm ok with the first part of the code with creating the MyObjectList and fill it, this works really well and fast for me.

I need your help to get the insert transaction as fast as possible. Is there any better way of looking for duplicates and ignore when there is one? Any hints or examples would be helpful.

coderone
  • 1
  • 2
  • Try to wrap all insert statement with BEGIN/END as discussed over https://stackoverflow.com/questions/3852068/sqlite-insert-very-slow – Anand Chapla Jan 28 '20 at 08:11
  • @AnandChapla as I mentioned in my question, I already have this. The INSERT-statements in the loop are wrapped in one transaction. They are not single hits on the database. – coderone Jan 28 '20 at 08:16
  • Then `where not exists` query must take time to evaluate condition. Can you please check what is time difference when you insert duplicate records and when you insert unique once. – Anand Chapla Jan 28 '20 at 08:22
  • @AnandChapla when I insert 10000 records in the empty table it takes 3.428 seconds. When I insert the same file again as duplicates it takes 3.223 seconds. Then its faster because of the missing inserts. But when I try to make the same test with 100,000 inserts it will take like over 5 minutes for empty database insert. When I then try to insert again the same file it will take also so long. Every single insert execute takes longer. Like I mentioned in question first between 0-1 ms but then exponential increase. – coderone Jan 28 '20 at 08:29
  • 1
    `INSERT OR IGNORE` and a unique index on all columns in the table? – Shawn Jan 28 '20 at 08:41
  • If your question is only on the insert part I will recommend : https://stackoverflow.com/questions/36778304/how-to-bulk-insert-into-sqlite-database, https://stackoverflow.com/questions/1609637/is-it-possible-to-insert-multiple-rows-at-a-time-in-an-sqlite-database, https://stackoverflow.com/questions/4356363/sqlite-net-performance-how-to-speed-up-things – xdtTransform Jan 28 '20 at 09:03
  • note that sqlLite has an import csv built in. – xdtTransform Jan 28 '20 at 09:06
  • and this wonder https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite – xdtTransform Jan 28 '20 at 09:10
  • xdtTransform thanks for your csv helper hint, I will have a look at this. But my hero is now @Shawn. Read about that function, but I didn't thought that would speed things so much up. Thx It works now in under 5 seconds for 100,000 rows. – coderone Jan 28 '20 at 09:13

0 Answers0