0

I use a SQL Server CE database for some simulation results. When I test the reading speed the benchmark-times differ greatly.

  • dbs are split to 4 .SDF files (4 quartals)
  • 302*525000 entries overall
  • four SqlCeConnections
  • they are opened before reading and stay open
  • all four databases are located on the same disk (SSD)
  • I use SqlCeDataReader (IMHO as low-level and fast as you can get)
  • Reading process is parallel

Simplified code

for (int run = 0; run < 4; run++)
{
    InitializeConnections();

    for (int reading = 0; reading < 6; reading++)
    {
        ResetMemoryObjects();

        Parallel.For(0, quartals.Count, (i) =>
                    {
                         values[i] = ReadFromSqlCeDb(i);
                    }
    }
}
  • connections are initialized each run 1 time
  • all readings take place in a simple for loop and are exactly the same
  • before each reading all the objects are reinitialized

These are the benchmark results I get:

enter image description here

At this point I'm honest - I have no idea why SQL Server CE behaves that way. Maybe someone can give me a hint?

Edit 1: I made a more in depth analysis of each step during the parallel reading. The following chart shows the steps, the "actual reading" is the part inside the while(readerdt.Read()) section.

enter image description here

Edit 2:

After ErikEJ' suggestion I added a TableDirect Approach and made 150 runs, 75 for SELECT and 75 for TableDirect. I summed up the Pre- & Postprocessing of the reading process, because this remains stable and nearly the same for all runs. What differs vastly is the actual reading process.

Every second run was done via TableDirect, so they both start to get drastic better results at around run 65 simultaneously. The range goes from 5.7 second up to 37.4 seconds.

This is the "acutal reading" code. (there are four different databases/sdf files with four different SqlCe connections. Tested on Ryzen7 8-Core CPU)

 private static List<List<double>> ReadDataFromDbTableDirect((double von, double bis) timepoints, List<(string compName, string resName)> components, int dbIdx)
    {

        var values = new List<List<double>>();
        for (int j = 0; j < components.Count; j++)
        {
            values.Add(new List<double>());
        }

        using (var command = sqlCeCon[dbIdx].CreateCommand())
        {
            command.CommandType = CommandType.TableDirect;
            command.CommandText = table.TableName;
            command.IndexName = "PKTIME";
            command.SetRange(DbRangeOptions.InclusiveStart | DbRangeOptions.InclusiveEnd, new object[]{timepoints.von},new object[]{timepoints.bis});

            using (var reader = (SqlCeDataReader)command.ExecuteReader(CommandBehavior.Default))
            {
                while (reader.Read())
                {
                    for (int j = 0; j < components.Count; j++)
                    {
                        if (!reader.IsDBNull(j))
                        {
                            if (j == 0)
                            {
                                values[j].Add(reader.GetInt32(j));
                            }
                            else
                            {
                                values[j].Add(reader.GetDouble(j));
                            }
                        }
                    }
                }
            }
        }

        return values;
    }

Still no idea why it has such a great delay. SDF looks like this enter image description here

Edit 3: Today I made the same approach with a single database instead of four databases (to exclude problems with Parallel/Tasks). While here TableDirect has a little advantage, the main problem of differing reading speed persists (the sdf data is the same so it is comparable). enter image description here

Edit 4: These are the results on another machine. Still large outbursts, but a bit more stable. Overall still same issue.

enter image description here

Edit 5: These are the results of a 4x smaller db - 500 runs. TableDirect & Select are (as in previous benchmarks) run alternately, but to better see the results in the graph they are shown in sequence. Notice that the overall time here is not 4-times smaller as you'd expect, but ~8-times smaller. Same problem with the high reading time in the beginning. Next I'll optimize ArrayPooling and stuff...

enter image description here

Edit 6: To further investigate I tried this on 2 different machines. Pc#2 has Win11Home, Ryzen 5, SSD (quite new) and no Antivir, Pc#3 has Win10Pro - pristine installation, everything deactivated (WinDefender), SSD, Ryzen 7. On Pc#3 there is just one peak (first run), on Pc#2 there are several besides the initial requests.

enter image description here enter image description here

Edit 7 : After ErikEJ's suggestion it might be due to an Index Rebuild, I tried several things.

  • A test the reading times after a fresh simulation (db is freshly built in the simulation)
  • B test the reading times after copying and loading a db from another folder and apply a Verify(SqlCeEngine) on the db (Verify)
  • C test the reading times after copying and loading a db from another folder (no special db treatment)
  • D test the reading times after copying and loading a db from another folder, then make a quick first call with one row of all cols (preparation call)
  • I also tested SqlCeEngine Repair & Compact. They had nearly the same results as B

enter image description here

  • It seems like a verification solves the problem with the initial reading speed. Unfortunately the verification itself takes quite long (>10s on big dbs). Is there a quicker solution for this?
  • Result D is a complete surprise to me (mind the different scale). I don't understand what is happening here.. Any guesses?
  • Result C shows the long reading times on initial readings, but not always, which is irritating. Maybe it is not the index rebuild which is causing this?
  • I still have very huge variations in the reading speed on bigger databases.

I'm currently working on an improved reading process with pointers/memory to reduce GC pressure. I will make the same tests today on another machine.

If anyone has an idea how to improve/stabilize reading speeds please let me know! Thanks in advance!

c_ph_r
  • 207
  • 3
  • 13
  • Maybe GC pressure. Do you use Direct access? – ErikEJ Aug 26 '22 at 12:43
  • It is possible. I don't call GC.Collect() manually normally, but after your suggestion I did so right after ResetMemoryObjects(). The result were a bit worse now ranging from 34s-69s. In other tests without manually GC.Collect I also had these reading times up to 68s, so this is not quite specific, but might be a hint. Next I will benchmark every step in the reading process to figure out if it's purely the while (readerdt.Read()) section... (what do you mean by direct access? The ssd is directly built in the computer not in any network or sth...) – c_ph_r Aug 26 '22 at 17:36
  • @ErikEJ I added the analysis chart in the main question. Still confused why this happens – c_ph_r Aug 26 '22 at 18:22
  • I think doing all this in parallel caused looking, as you are accessing a single file. What is your connection string? Maybe read only mode would help? Or no parallel. – ErikEJ Aug 27 '22 at 07:31
  • By direct access I mean this, can be 3 x faster than select - but you are keeping many things secret for us https://erikej.blogspot.com/2015/07/sql-server-compact-adonet-data-access.html – ErikEJ Aug 27 '22 at 07:32
  • @ErikEJ thank you so much for your suggestion. I already tried table direct before, but to have it doubleproof I reimplemented it and tested the code. I edited my answer above (Edit 2) – c_ph_r Aug 29 '22 at 11:51
  • @ErikEJ Today I tried the same with a single database to exclude errors from thread.pooling / parallel execution. The main difference in reading speed unfortunetly persists (made 50 runs, so 25 each approach, see Edit 3 in main answer). – c_ph_r Aug 30 '22 at 08:31
  • Virus scanner or similar? – ErikEJ Aug 30 '22 at 08:57
  • Now I tried to define double Arrays with a predefined size. I hoped that this instead of the List approach would fix some assumed memory-allocation issues. But it's still the same unfortunately. Next I will widen my testing on other machines and will deactivate antivir programs. Thanks for the hint!! – c_ph_r Aug 30 '22 at 13:06
  • @ErikEJ I reproduced the tests on two different machines (Edit#6), both no antivirus, one with a pristine windows installation. Unfortunately (or luckily? who knows) they both have peaks in the initial request. Do you have this kind of behaviour in your SqlCe db's too? Do you have any idea what causes this? – c_ph_r Sep 01 '22 at 08:54
  • Has the database been moved from another machine - could be index rebuild then – ErikEJ Sep 01 '22 at 10:49
  • @ErikEJ When there's another PC involved yes. Your hint with index rebuild was good and I tried several things. I made a new Edit (#7) with the results, which are not solving the problem unfortunately, but have some interesting surprises (at least to me). I really (!) appreaciate your help and would be glad to have your comment on this. – c_ph_r Sep 12 '22 at 07:54

0 Answers0