I am using Entity Framework Core and the nuget package Npgsql.EntityFrameworkCore.PostgreSQL
.
I have read all of the other answers regarding slow inserts to Entity Framework Core, but none have helped.
using (var db = getNewContext())
{
db.Table1.Add(Table1Object);
db.SaveChanges();
}
This single insert takes around 20 to 30 seconds. There are less than 100 rows in the table. I have put a stopwatch start and stop inside the using to make sure the time is not due to context initialisation.
Here is my class for the table object (relevant property names changed):
public partial class Table1Object
{
public long Id { get; set; }
public Guid SessionId { get; set; }
public DateTime Timestamp { get; set; }
public long MyNumber1 { get; set; }
public double MyNumber2 { get; set; }
public double MyNumber3 { get; set; }
public double MyNumber4 { get; set; }
public long? ParentId { get; set; }
public bool MyBool { get; set; }
}
SessionId
is used to link to another table (Session table), but I have not explicitly defined a foreign key or any other constraints for this anywhere. ParentId
is also used to link back to another row in the same table, but I have not explicitly defined a constraint for this.
Running the equivalent code on a different table takes less than a second to insert a single row. Table2 has less columns but I wouldn't have thought the row size is so different to have such a drastic effect:
public partial class Table2Object
{
public int Id { get; set; }
public DateTime Timestamp { get; set; }
public string Name { get; set; }
public double Value { get; set; }
}
Using Serilog and Entity Framework Core logging you can see that the delay is in the "Committing transaction" step which takes about 26 seconds, the insert itself only takes 6ms (Some parts of log statements trimmed down for brevity):
2021-04-08 11:20:36.874 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table1'.
2021-04-08 11:20:36.879 [DBG] Context 'DataContext' started tracking 'Table1' entity.
2021-04-08 11:20:36.880 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.881 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.905 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.906 [DBG] Opening connection to database
2021-04-08 11:20:36.907 [DBG] Opened connection to database
2021-04-08 11:20:36.908 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.909 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.912 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.913 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.914 [DBG] Executing DbCommand [Parameters= ...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.920 [INF] Executed DbCommand (6ms) Parameters=[...]
INSERT INTO "Table1" ("SessionId", "Timestamp" ...)
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7)
RETURNING "Id";
2021-04-08 11:20:36.925 [DBG] The foreign key property 'Table1.Id' was detected as changed.
2021-04-08 11:20:36.930 [DBG] A data reader was disposed.
2021-04-08 11:20:36.931 [DBG] Committing transaction.
2021-04-08 11:21:02.729 [DBG] Committed transaction.
2021-04-08 11:21:02.730 [DBG] Closing connection to database
Here is the equivalent log when inserting to Table2. The insert takes 3ms and the commit takes 75ms. This is how fast it should be:
2021-04-08 11:20:36.459 [DBG] 'DataContext' generated a temporary value for the property 'Id.Table2'.
2021-04-08 11:20:36.460 [DBG] Context 'DataContext' started tracking 'Table2' entity.
2021-04-08 11:20:36.461 [DBG] SaveChanges starting for 'DataContext'.
2021-04-08 11:20:36.462 [DBG] DetectChanges starting for 'DataContext'.
2021-04-08 11:20:36.463 [DBG] DetectChanges completed for 'DataContext'.
2021-04-08 11:20:36.464 [DBG] Opening connection to database
2021-04-08 11:20:36.465 [DBG] Opened connection to database
2021-04-08 11:20:36.466 [DBG] Beginning transaction with isolation level 'Unspecified'.
2021-04-08 11:20:36.467 [DBG] Began transaction with isolation level 'ReadCommitted'.
2021-04-08 11:20:36.468 [DBG] Creating DbCommand for 'ExecuteReader'.
2021-04-08 11:20:36.469 [DBG] Created DbCommand for 'ExecuteReader' (0ms).
2021-04-08 11:20:36.470 [DBG] Executing DbCommand [Parameters=...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.472 [INF] Executed DbCommand (3ms) [Parameters=[...]
INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES (@p0, @p1, @p2)
RETURNING "Id";
2021-04-08 11:20:36.474 [DBG] The foreign key property 'Table2.Id' was detected as changed.
2021-04-08 11:20:36.475 [DBG] A data reader was disposed.
2021-04-08 11:20:36.476 [DBG] Committing transaction.
2021-04-08 11:20:36.551 [DBG] Committed transaction.
2021-04-08 11:20:36.552 [DBG] Closing connection to database
I am at a loss for what is different between the tables other than the slightly bigger row size. I've dropped and recreated the table in case there were any constraints, foreign keys, triggers, etc. that I was unaware of.
An "Explain" plan for the insert generates:
"Insert on ""Table1"" (cost=0.00..0.01 rows=1 width=81)"
" -> Result (cost=0.00..0.01 rows=1 width=81)"
Enabling "Show query log" for postgresql gives about the same amount of information as entity framework logging does:
2021-04-09 12:05:06.559 BST [1979] user1@database LOG: statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.560 BST [1979] user1@database LOG: execute <unnamed>: INSERT INTO "Table1" (...)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
RETURNING "Id"
2021-04-09 12:05:06.560 BST [1979] user1@database DETAIL: parameters: $1 = '0.580484961751977', $2 = 'f', $3 = '0.205387434417341', $4 = '18', $5 = '148', $6 = '93c71fb5-836a-486a-8d82-e073743b41cd', $7 = '2021-04-09 11:04:58.123773', $8 = '1.15474773024298'
2021-04-09 12:05:06.565 BST [1979] user1@database LOG: statement: COMMIT
2021-04-09 12:05:47.352 BST [1443] postgres@database LOG: statement: /*pga4dash*/
SELECT 'session_stats' AS chart_name, row_to_json(t) AS chart_data
FROM ...
UNION ALL
SELECT 'tps_stats' AS chart_name, row_to_json(t) AS chart_data
FROM ...
UNION ALL
SELECT 'ti_stats' AS chart_name, row_to_json(t) AS chart_data
FROM ...
UNION ALL
SELECT 'to_stats' AS chart_name, row_to_json(t) AS chart_data
FROM ...
UNION ALL
SELECT 'bio_stats' AS chart_name, row_to_json(t) AS chart_data
FROM ...
2021-04-09 12:05:51.148 BST [1979] user1@database LOG: statement: DISCARD ALL
You can see that after the COMMIT statement, about 41 seconds passsed before the next statement to do some internal chart logging info. 41 Seconds just to commit a single row insert!
Compare this to an insert for Table2, the commit only takes 100 milliseconds!
2021-04-09 12:05:06.097 BST [1979] user1@database LOG: statement: BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED
2021-04-09 12:05:06.097 BST [1979] user1@database LOG: execute <unnamed>: INSERT INTO "Table2" ("Name", "Timestamp", "Value")
VALUES ($1, $2, $3)
RETURNING "Id"
2021-04-09 12:05:06.097 BST [1979] user1@database DETAIL: parameters: $1 = 'Test', $2 = '2021-04-09 11:05:06.096182', $3 = '98'
2021-04-09 12:05:06.098 BST [1979] user1@database LOG: statement: COMMIT
2021-04-09 12:05:06.189 BST [1979] user1@database LOG: statement: DISCARD ALL
I ran the following statement directly in PGAdmin, and it told me it took 323ms:
BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO "Table1" ("MyColumn1", "MyColumn2", "MyColumn3", "MyColumn4", "ParentId", "SessionId", "Timestamp", "MyColumn5")
VALUES ('0.580484961751977','f' , '0.205387434417341','18', '148', '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773', '1.15474773024298')
RETURNING "Id";
COMMIT;
I also tried running the statement directly using NpgSql with the following C# code:
_logger.Debug("Using connection");
using (var conn = new NpgsqlConnection(StaticConfig.ConnectionString))
{
_logger.Debug("connection.open");
conn.Open();
_logger.Debug("Using command");
// Insert some data
using (var cmd = new NpgsqlCommand(
" BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;" +
" INSERT INTO \"Table1\" (\"MyColumn1\", \"MyColumn2\", \"MyColumn3\", \"MyColumn4\", \"ParentId\", \"SessionId\", \"Timestamp\", \"MyColumn5\")" +
" VALUES ('0.580484961751977','f' , '0.205387434417341','18', '148', '93c71fb5-836a-486a-8d82-e073743b41cd','2021-04-09 11:04:58.123773', '1.15474773024298')" +
" RETURNING \"Id\";" +
"COMMIT;"
, conn))
{
_logger.Debug("command execute");
cmd.ExecuteNonQuery();
}
}
_logger.Debug("Done");
The log statements in that code told me that the whole thing took less than a second:
[21:59:41 DBG] Using connection
[21:59:41 DBG] connection.open
[21:59:42 DBG] Using command
[21:59:42 DBG] command execute
[21:59:42 DBG] Done
I've also dropped the database, deleted all the migrations from Entity Framework, and created a new Initial create migration, so everything is running from scratch, it still takes around 20 seconds to insert into Table1, but less than a second to insert into Table2.
Putting Enlist=false in the connection string doesn't help.
I would agree with @Mark G's comment that "The findings ... would suggest that the issue either lies upstream in EF Core or in the provider", but I'm unsure how to diagnose the problem further.
I've since changed the code to use NpgSql to insert rows into this table via raw sql, and that's pretty fast, less than 100ms per insert. So the most likely candidate seems to be a bug in Entity Framework Core, but since I don't know what the issue is specifically, it makes raising a bug report with their team difficult.