3

It can be running SaveChanges() for large number of records, but the performance is not acceptible.

Here is an example Two tables:

[contact] (id, name, zip, city, status)

[zip_city] (id, zip, city)

Needs to update the status in table contact for all column zip, city being availabel in table zip_city

It works and the performance is acceptable when the number of records is less than 10,000.

But for +10,000 records, Visual Studio (Debugger) complains that it takes too long: The CLR was unable to transition from COM context 0xb67898 to COM context 0xb67728 for 60 seconds.... at building the query "qry"

var ctx = new DbContext();

var qry = ctx.contact.Join(ctx.zip_city, c => new { c.zip, c.city }, z => new { z.zip, z.city }, (c, z) => new { c, z })
           .Select(x => new dtoContact { id = x.c.id }).ToList();

foreach (var con in ctx.contact)
{
    if (qry.Any(x => x.Id == con.id))
    {
        con.status = "P/O";
    }
    else
    {
        con.status = "???";
    }
}

ctx.SaveChanges();

With following code it raises same message by running ctx.SaveChanges();

var ctx = new DbContext();

var zc = ctx.zip_city.ToList();

foreach (var con in ctx.contact)
{
    if (zc.Any(x => x.zip == con.zip && x.city == con.city))
    {
        con.status = "P/O";
    }
    else
    {
        con.status = "???";
    }
}

ctx.SaveChanges();

Reference: Data Transfer Object

public class dtoContact 
{
    public int id { get; set; }
    public string name { get; set; }
    public string zip { get; set; }
    public string city { get; set; }
    public string status { get; set; }
}

Info: Visual Studio (Debugger) message! google translation from VS15 German version

The Managed Debugging Assistant "" ContextSwitchDeadlock "" encountered a problem in "C: \ Projects \ Sweepstakes_EF6 \ TrafficSightAct \ bin \ Debug \ Sweepstakes.exe". Additional Information: The CLR was unable to transition from COM context 0xb67898 to COM context 0xb67728 for 60 seconds. The thread that owns the target context / apartment either waits without moving messages or processes a very long-lasting operation without moving Windows messages. Such a situation typically slows performance and may even cause the application to stop responding or to increase memory usage.

gh2018
  • 51
  • 7
  • 1
    What exception? – SᴇM Nov 28 '18 at 12:45
  • Can you post also the message of the exception what you are receiving? – nemesv Nov 28 '18 at 12:46
  • I'm assuming it's either a timeout (I doubt this, since 10000 is not a lot), or the amount of records is misleading and not actually relevant - but rather it's a data problem in one of the later records that violates a constraint or something. Either way - need to see details of that exception. –  Nov 28 '18 at 12:49
  • When you add a lot of entries, either use the extension bulk savechanges or call SaveChanges() for single batches of 100 entities or something like that. The only problem with query building and many instances was when trying to build a query like Numbers.Any(y=>y==x.Number) for >1.200.000 entries in Numbers, in which case you should use .Contains instead of .Any – DevilSuichiro Nov 28 '18 at 12:57
  • https://translate.google.com/#de/en/ – stuartd Nov 28 '18 at 12:58
  • I tried the batches of 100, but it doesn't work, either. int count = 0; foreach (var con in ctx.contact) { if (zc.Any(x => x.zip == con.zip && x.city == con.city)) { con.status = "P/O"; } else { con.status = "???"; } ++count; if (count % 100 == 0) { ctx.SaveChanges(); } } – gh2018 Nov 28 '18 at 13:01
  • Your message is just Visual Studio complaining that it takes to long, you can just simply hit `F5` (run) and the error will go away, that error won't happen when you run it without Debugger - it's just the Debugger complaining. See this: https://stackoverflow.com/questions/578357/visual-studio-contextswitchdeadlock – Rand Random Nov 28 '18 at 13:22
  • 1
    Possible duplicate of [Visual Studio: ContextSwitchDeadlock](https://stackoverflow.com/questions/578357/visual-studio-contextswitchdeadlock) – Rand Random Nov 28 '18 at 13:23
  • @gh2018 if you want to check why performance is bad, use SQL Server Profiler to see what the actual SQL statement looks like. Joining on text columns is unusual - are the fields covered by indexes? If not, the server will have to scan both tables and match every row from one table against every other row on the other table – Panagiotis Kanavos Nov 28 '18 at 13:30
  • 3
    @gh2018 in any case ORMs like EF are meant for working with *entities*. Your query on the other hand just tries to perform an UPDATE FROM. That's definitely not related to entities, so you shouldn't use an ORM to do it. Write a relatively simple `UPDATE Contact Set Status= IIF(z.zip IS NULL,'???','P/p')..) FROM Contact LEFT join zip_city z on Contact.city=z.city AND ...` – Panagiotis Kanavos Nov 28 '18 at 13:33
  • SInce you are only checking if there is any id - you should make a `HashSet` with the ids and call `ids.Contains(con.Id);` in your for each this should greatly increase your Performance. - so instead of `.Select(x => new dtoContact { id = x.c.id }).ToList()` just call `.Select(x => x.c.id).ToHashSet()` – Rand Random Nov 28 '18 at 13:34
  • @Rand Random, you are right. just hit F5 (run) brings the process to end. But it takes more than 80 seconds for 27,000 records. Thank you for the tip! – gh2018 Nov 28 '18 at 13:40
  • @gh2018 that's a problem with this code and probably missing indexes, not EF. Using any ORM for this job is simply wrong. They aren't meant to *replace* SQL. Even so, if the fields aren't indexed you'll get slow performance. 10K data is no data at all for a database – Panagiotis Kanavos Nov 28 '18 at 13:51
  • @gh2018 if you want to troubleshoot performance use SQL Server profiler to capture the query and check its execution plan in SSMS. I suspect it will show table scans instead of index seeks – Panagiotis Kanavos Nov 28 '18 at 13:52

2 Answers2

1

In this case, I would like to recommend using SqlCommand.

for example:

var ctx = new DbContext();

string sql = "update contact c inner join zip_city z " +
             "on c.plz = z.plz and c.ort = z.ort set c.status = 'P/O'; ";
sql = sql  + "update contact set status = '???' where status <> 'P/O'; ";

ctx.Database.ExecuteSqlCommand(sql);

That could take just few seconds for your 27,000 records, I think.

Please try.

alex
  • 365
  • 6
  • 7
  • it is a good alternative solution, even if it is not related to SaveChanges(); thanks! – gh2018 Nov 28 '18 at 14:12
  • 1
    You assume previous data in the field could not have been 'P/O'. Doing the update to '???' first without constraint is closer to the original. – nvoigt Nov 28 '18 at 14:40
1

Thank you all for valuable comments!

To keep the above question-comments in lines, I create an answer here.

I tested in comments provided recommendations with 27,000 records in table contact and 47,000 records in table zip_city but without any database index change (add index suggested by Panagiotis Kanavos).

Followings are performance test results

Original code 1, it takes 113 seconds.

var ctx = new DbContext();
var qry = ctx.contact.Join(ctx.zip_city, c => new { c.zip, c.city }, z => new { z.zip, z.city }, (c, z) => new { c, z })
           .Select(x => new dtoContact { id = x.c.id }).ToList();
foreach (var con in ctx.contact)
{
    if (qry.Any(x => x.Id == con.id))
    {
        con.status = "P/O";
    }
    else
    {
        con.status = "???";
    }
}
ctx.SaveChanges();

Base on DevilSuichiro, "bulk savechanges" is not available in DbContext with MySQL, so here only SaveChanges() is called for single batches of 100 entities. It takes 107 seconds.

var ctx = new DbContext();
var qry = ctx.contact.Join(ctx.zip_city, c => new { c.zip, c.city }, z => new { z.zip, z.city }, (c, z) => new { c, z })
           .Select(x => new dtoContact { id = x.c.id }).ToList();
int count = 0;
foreach (var con in ctx.contact)
{
    if (qry.Any(x => x.Id == con.id))
    {
        con.status = "P/O";
    }
    else
    {
        con.status = "???";
    }
    ++count;
    if (count % 100 == 0)
    {
        ctx.SaveChanges();
    }
}
ctx.SaveChanges();

Based on Rand Random, HashSet is used to increase the performance. It still take more than 100 seconds.

var ctx = new DbContext();
var qry = ctx.contact.Join(ctx.zip_city, c => new { c.zip, c.city }, z => new { z.zip, z.city }, (c, z) => new { c, z }).Select(id = x.c.id);
var ids = new HashSet<int>(qry);           
foreach (var con in ctx.contact)
{
    if (ids.Contains(con.id))
    {
        con.status = "P/O";
    }
    else
    {
        con.status = "???";
    }
}
ctx.SaveChanges();

Original code 2, it takes 67 seconds.

var ctx = new DbContext();
var zc = ctx.zip_city.ToList();
foreach (var con in ctx.contact)
{
    if (zc.Any(x => x.zip == con.zip && x.city == con.city))
    {
        con.status = "P/O";
    }
    else
    {
        con.status = "???";
    }
}
ctx.SaveChanges();

Based on Panagiotis Kanavos, Using any ORMs (like here DbContext SaveChanges() ) for this Data-UPDATE is really not a fit solution.

I tested the code from alex, nvoigt, it takes less than 2 seconds.

var ctx = new DbContext();
string sql = "update contact set status = '???'; " + 
    "update contact c inner join zip_city z on c.plz = z.plz and c.ort = z.ort set c.status = 'P/O'; ";
ctx.Database.ExecuteSqlCommand(sql);
keuleJ
  • 3,418
  • 4
  • 30
  • 51
gh2018
  • 51
  • 7