3

I am using Entity Framework Core and the nuget package Npgsql.EntityFrameworkCore.PostgreSQL.

I have read all of the other answers regarding slow inserts to Entity Framework Core, but none have helped.

using (var db = getNewContext())
{
     db.Table1.Add(Table1Object);
     db.SaveChanges();
}

This single insert takes around 20 to 30 seconds. There are less than 100 rows in the table. I have put a stopwatch start and stop inside the using to make sure the time is not due to context initialisation.

Here is my class for the table object (relevant property names changed):

public partial class Table1Object
{
    public long Id { get; set; }
    public Guid SessionId { get; set; }
    public DateTime Timestamp { get; set; }
    public long MyNumber1 { get; set; }
    public double MyNumber2 { get; set; }
    public double MyNumber3 { get; set; }
    public double MyNumber4 { get; set; }
    public long? ParentId { get; set; }
    public bool MyBool { get; set; }
}

SessionId is used to link to another table (Session table), but I have not explicitly defined a foreign key or any other constraints for this anywhere. ParentId is also used to link back to another row in the same table, but I have not explicitly defined a constraint for this.

Running the equivalent code on a different table takes less than a second to insert a single row. Table2 has less columns but I wouldn't have thought the row size is so different to have such a drastic effect:

public partial class Table2Object
{
    public int Id { get; set; }
    public DateTime Timestamp { get; set; }
    public string Name { get; set; }
    public double Value { get; set; }
}

Using Serilog and Entity Framework Core logging you can see that the delay is in the "Committing transaction" step which takes about 26 seconds, the insert itself only takes 6ms (Some parts of log statements trimmed down for brevity):

2021-04-08 11:20:36.874 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table1'.
2021-04-08 11:20:36.879 [DBG] Context 'DataContext' started tracking 'Table1' entity.
2021-04-08 11:20:36.880 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.881 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.905 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.906 [DBG] Opening connection to database
2021-04-08 11:20:36.907 [DBG] Opened connection to database
2021-04-08 11:20:36.908 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.909 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.912 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.913 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.914 [DBG] Executing DbCommand [Parameters= ...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.920 [INF] Executed DbCommand (6ms) Parameters=[...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.925 [DBG] The foreign key property 'Table1.Id' was detected as changed.
2021-04-08 11:20:36.930 [DBG] A data reader was disposed.
2021-04-08 11:20:36.931 [DBG] Committing transaction.
2021-04-08 11:21:02.729 [DBG] Committed transaction.
2021-04-08 11:21:02.730 [DBG] Closing connection to database

Here is the equivalent log when inserting to Table2. The insert takes 3ms and the commit takes 75ms. This is how fast it should be:

2021-04-08 11:20:36.459 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table2'.
2021-04-08 11:20:36.460 [DBG] Context 'DataContext' started tracking 'Table2' entity.
2021-04-08 11:20:36.461 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.462 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.463 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.464 [DBG] Opening connection to database
2021-04-08 11:20:36.465 [DBG] Opened connection to database
2021-04-08 11:20:36.466 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.467 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.468 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.469 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.470 [DBG] Executing DbCommand [Parameters=...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.472 [INF] Executed DbCommand (3ms) [Parameters=[...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.474 [DBG] The foreign key property 'Table2.Id' was detected as changed.
2021-04-08 11:20:36.475 [DBG] A data reader was disposed.
2021-04-08 11:20:36.476 [DBG] Committing transaction.
2021-04-08 11:20:36.551 [DBG] Committed transaction.
2021-04-08 11:20:36.552 [DBG] Closing connection to database

I am at a loss for what is different between the tables other than the slightly bigger row size. I've dropped and recreated the table in case there were any constraints, foreign keys, triggers, etc. that I was unaware of.

An "Explain" plan for the insert generates:

"Insert on ""Table1""  (cost=0.00..0.01 rows=1 width=81)"
"  ->  Result  (cost=0.00..0.01 rows=1 width=81)"

Enabling "Show query log" for postgresql gives about the same amount of information as entity framework logging does:

2021-04-09 12:05:06.559 BST [1979] user1@database LOG:  statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.560 BST [1979] user1@database LOG:  execute <unnamed>: INSERT INTO "Table1" (...)
    VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
    RETURNING "Id"
2021-04-09 12:05:06.560 BST [1979] user1@database DETAIL:  parameters: $1 = '0.580484961751977', $2 = 'f', $3 = '0.205387434417341', $4 = '18', $5 = '148', $6 = '93c71fb5-836a-486a-8d82-e073743b41cd', $7 = '2021-04-09 11:04:58.123773', $8 = '1.15474773024298'
2021-04-09 12:05:06.565 BST [1979] user1@database LOG:  statement: COMMIT
2021-04-09 12:05:47.352 BST [1443] postgres@database LOG:  statement: /*pga4dash*/
    SELECT 'session_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'tps_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'ti_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'to_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    UNION ALL
    SELECT 'bio_stats' AS chart_name, row_to_json(t) AS chart_data
    FROM ...
    
2021-04-09 12:05:51.148 BST [1979] user1@database LOG:  statement: DISCARD ALL

You can see that after the COMMIT statement, about 41 seconds passsed before the next statement to do some internal chart logging info. 41 Seconds just to commit a single row insert!

Compare this to an insert for Table2, the commit only takes 100 milliseconds!

2021-04-09 12:05:06.097 BST [1979] user1@database LOG:  statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.097 BST [1979] user1@database LOG:  execute <unnamed>: INSERT INTO "Table2" ("Name", "Timestamp", "Value")
    VALUES ($1, $2, $3)
    RETURNING "Id"
2021-04-09 12:05:06.097 BST [1979] user1@database DETAIL:  parameters: $1 = 'Test', $2 = '2021-04-09 11:05:06.096182', $3 = '98'
2021-04-09 12:05:06.098 BST [1979] user1@database LOG:  statement: COMMIT
2021-04-09 12:05:06.189 BST [1979] user1@database LOG:  statement: DISCARD ALL

I ran the following statement directly in PGAdmin, and it told me it took 323ms:

BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO "Table1" ("MyColumn1", "MyColumn2", "MyColumn3", "MyColumn4", "ParentId", "SessionId", "Timestamp", "MyColumn5")
    VALUES ('0.580484961751977','f' , '0.205387434417341','18',  '148',  '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773',  '1.15474773024298')
    RETURNING "Id";
COMMIT;

I also tried running the statement directly using NpgSql with the following C# code:

            _logger.Debug("Using connection");
            using (var conn = new NpgsqlConnection(StaticConfig.ConnectionString))
            {
                _logger.Debug("connection.open");
                conn.Open();
                _logger.Debug("Using command");
                // Insert some data
                using (var cmd = new NpgsqlCommand(
                    " BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;" +
                    " INSERT INTO \"Table1\" (\"MyColumn1\", \"MyColumn2\", \"MyColumn3\", \"MyColumn4\", \"ParentId\", \"SessionId\", \"Timestamp\", \"MyColumn5\")" +
                    " VALUES ('0.580484961751977','f' , '0.205387434417341','18',  '148',  '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773',  '1.15474773024298')" +
                    " RETURNING \"Id\";" +
                    "COMMIT;"
                    , conn))
                {
                    _logger.Debug("command execute");
                    cmd.ExecuteNonQuery();
                }
            }
            _logger.Debug("Done");

The log statements in that code told me that the whole thing took less than a second:

[21:59:41 DBG] Using connection
[21:59:41 DBG] connection.open
[21:59:42 DBG] Using command
[21:59:42 DBG] command execute
[21:59:42 DBG] Done

I've also dropped the database, deleted all the migrations from Entity Framework, and created a new Initial create migration, so everything is running from scratch, it still takes around 20 seconds to insert into Table1, but less than a second to insert into Table2.

Putting Enlist=false in the connection string doesn't help.

I would agree with @Mark G's comment that "The findings ... would suggest that the issue either lies upstream in EF Core or in the provider", but I'm unsure how to diagnose the problem further.

I've since changed the code to use NpgSql to insert rows into this table via raw sql, and that's pretty fast, less than 100ms per insert. So the most likely candidate seems to be a bug in Entity Framework Core, but since I don't know what the issue is specifically, it makes raising a bug report with their team difficult.

F Chopin
  • 574
  • 7
  • 23
  • Did you have run [explain plan](https://www.postgresql.org/docs/current/using-explain.html) in postgresql? – KennetsuR Apr 08 '21 at 16:01
  • I've run an explain plan for the insert and updated the question with the result. it doesn't give much information. We can see that the insert is only taking 6ms anyway so I think the delay is coming from somewhere else. – F Chopin Apr 08 '21 at 16:33
  • could you have check locking or other bad query on the same table1?Can you check table1 sql log? – KennetsuR Apr 08 '21 at 16:52
  • I am fairly new to postgresql, how do I check the sql log? I have PG Admin V4.5 – F Chopin Apr 08 '21 at 17:01
  • You can try [show query log](https://tableplus.com/blog/2018/10/how-to-show-queries-log-in-postgresql.html) – KennetsuR Apr 08 '21 at 17:09
  • @KenTsu I've added the output for "show query log" for both an insert to table1 and table 2. The difference in time between the tables for an insert is absolutely mind boggling. – F Chopin Apr 09 '21 at 11:21
  • Probably a cut and paste issue, but `INSERT INTO INSERT INTO "Table1"` looks suspect. Have you tried just running the insert statement directly in PG Admin? If that works, then try running the statement using [ADO.NET](https://www.npgsql.org/doc/index.html) and see how fast that runs. – Mark G Apr 09 '21 at 16:49
  • The creation of the (My)DbContext could be expensive. You should use IOC to inject one, not "new it up" on demand. See : https://stackoverflow.com/questions/13551100/when-should-i-create-a-new-dbcontext/56187006#56187006 – granadaCoder Apr 09 '21 at 20:54
  • @MarkG That was just a copy paste error. I've updated my post with the findings from running the query in PG Admin and ADO.NET, they both take less than a second. – F Chopin Apr 09 '21 at 21:04
  • @granadaCoder We already know from the logs that the delay is in the COMMIT stage, and not in the creation / disposal of the DbContext – F Chopin Apr 09 '21 at 21:06
  • 1
    Based on your findings it would suggest that the issue either lies upstream in EF Core or in the provider - for the latter I'd also tag `npgsql` in your question. Also try using `Enlist=false` in the [connection string](https://www.npgsql.org/doc/connection-string-parameters.html). – Mark G Apr 09 '21 at 21:49
  • @Dominik I'm sure there are no other running processes accessing that database, and the table is empty. There are no indexes, rules or triggers on the table. Also, the slow insert speed varies between 20 - 40 seconds, but there has never been a case where it's less than 20 seconds, so the delay seems repeatable. – F Chopin Apr 12 '21 at 14:18
  • @Karl I read your question in more detail and found my questions pointless. You already tried the insert directly on the database which worked at normal performance, sorry. Maybe take a look with Wireshark or similar and check if the db causes the delay or it's something on clientside. If that points to the client side, next step is really to just clone entity framework and debug... – Dominik Apr 12 '21 at 14:20
  • @Karl Using https://learn.microsoft.com/en-us/visualstudio/debugger/decompilation?view=vs-2019 you should be able to step into EF souces with your vs debugger. – Dominik Apr 12 '21 at 14:28
  • What version of `Npgsql.EntityFrameworkCore.PostgreSQL` are you using? Can you set `context.Database.AutoTransactionsEnabled = false` and see if that makes any difference? – Mark G Apr 14 '21 at 00:50
  • @MarkG AutoTransactionsEnabled = false doesn't help. I'm using v5.0.2 – F Chopin Apr 14 '21 at 08:01
  • 1
    Are you able to just submit a repro for this? That seems like it should be possible since you say you've dropped and recreated your database, and are seeing the same thing. If you can submit a minimal schema and C# code sample as an issue on https://github.com/npgsql/efcore.pg, it can be investigated. – Shay Rojansky Apr 14 '21 at 11:39
  • See my answer. Should I update the title or question body after essentially finding out that the question is full of red herrings? – F Chopin Apr 23 '21 at 00:15

3 Answers3

1

After a lot of testing I eventually found that the problem was not in Entity framework or NpgSql at all, but the delay I was seeing was caused by write caching. I was always writing a 30MB file before inserting a row into table 1 and I believed that the file writing was done after File.WriteAllBytes returned so it wouldn't impact any future timing statements. However at the OS layer it was not really done writing to disk by the time the insert statement was run causing the insert statement to be artificially delayed.

I proved this with the following code:

Stopwatch sw1 = new Stopwatch();
sw1.Start();
File.WriteAllBytes(myBytes);
sw1.Stop();

Thread.Sleep(1000);

Stopwatch sw2 = new Stopwatch();
sw2.Start();
MethodThatInsertsIntoTable1();
sw2.Stop();

stopwatch 1 showed that File.WriteAllBytes always took about 500ms, then stopwatch 2 timed about 20 to 30 seconds.

If I change MethodThatInsertsIntoTable1 to insert into a different table then it still takes 20 to 30 seconds regardless of table.

If I increase Thread.Sleep(1000) to Thread.Sleep(30000) then stopwatch 2 records that the insert takes less than 10 milliseconds.

This shows that even after File.WriteAllBytes returns control to the program, it's not actually truly done writing the file to the disk.

The environment I was running on was linux on a raspberry pi. A write speed test confirms that my write speed to the sd card is just over 1MB/s which would align with the results I'm seeing, 20-30 seconds to write a 30MB file, it couldn't possibly be done in the 500ms that stopwatch 1 says it was.

Another user seems to experience an issue due to this in File.WriteAllBytes does not block

After adding an external SSD USB HDD to the raspberry pi and changing to save the file there instead, saving the file only takes 0.5 seconds and the problem goes away.

F Chopin
  • 574
  • 7
  • 23
0

The main diffference I can see between Table1Object and Table2Object is the presence of XxxId properties.

SessionId is used to link to another table (Session table), but I have not explicitly defined a foreign key or any other constraints for this anywhere. ParentId is also used to link back to another row in the same table, but I have not explicitly defined a constraint for this.

EF core recognizes this pattern and depending on your other tables it can create a Relation by convention, for example if the other table/entity has a navigation property to your Table1Object:

https://learn.microsoft.com/en-us/ef/core/modeling/relationships?tabs=fluent-api%2Cfluent-api-simple-key%2Csimple-key#conventions

With that in place here I go with an educated guess: Serilog and EF Core had some issues together. I am sure this is something temporary or already fixed but I would give a try removing serilog from the equation:

.NET 5 EF Core SaveChangesAsync hangs on errors

https://github.com/serilog/serilog-sinks-seq/issues/98

david-ao
  • 3,000
  • 5
  • 12
  • Rewarding bounty to the best answer, but I haven't had time to check what the actual problem was yet. Will do so later and report back. – F Chopin Apr 16 '21 at 09:46
-1

Have you try to import getNewContext instead of create a new instance any time you gonna save data??

private getNewContext _context;
       
        
        public RaceService( getNewContext context)
        {
          
            _context = context;
        }


public  Task<ReturnObject> MethodName()
        {
                        
               
                _context.Table1.Add(Table1Object);
                _context.SaveChange();
               
        }
Nederxus
  • 65
  • 1
  • 5
  • Yes I've tried that, and from the logs w can see the delay is happening in the commit stage, not in the creation or disposal of the context – F Chopin Apr 11 '21 at 23:06