1

I have an email marketing web application written in MVC 4 and Entity Framework 6. The application stores more than 10K email addresses along with contact information in a sql database on a web farm.

Everything works fine while the database is small, a few hundred records. However, when I fully populate the database, it is painfully slow retrieving and writing each individual record using "Foreach" statements. For some operations it can take up to 14 mins to complete. I have tried to minimize the number of records I work with at any one time, but it make the application clumsy.

I was wondering, are there other methods I could use to speed this up. Perhaps SQL stored procedures, or something on GitHub that would make it easier. If there are, please let me know where I can learn more about it. Any help would be appreciated. Here is one of the queries.

private int AddOrUpdateCampaigns(Campaign campaign
        , IEnumerable<AssignEmailContacts> assignEmailContacts)
{
    DataBaseContext db = new DataBaseContext();
    int TotalAssigned = 0;
    if (assignEmailContacts == null) return(TotalAssigned);

    if (campaign.CampaignId != 0)
    {   
      foreach (var emailContact 
                in assignEmailContacts.Where(a => a.Assigned == false))
      {   
        campaign.EmailContacts.Remove(
                db.EmailContacts.Find(emailContact.EmailContactId));
      }
      foreach (var emailContact 
                in assignEmailContacts.Where(a => a.Assigned))
      {                            
        campaign.EmailContacts.Add(
                db.EmailContacts.Find(emailContact.EmailContactId));
            TotalAssigned += 1;
      }
    }
    return (TotalAssigned);
}
surfmuggle
  • 5,527
  • 7
  • 48
  • 77
user2789697
  • 227
  • 1
  • 7
  • 13
  • 11
    10K records is a small database :) – Sergey Berezovskiy May 08 '14 at 20:15
  • 2
    10K is not a large database. What do your queries look like? Are you trying to pull back every record all at once? – Dismissile May 08 '14 at 20:15
  • 1
    @Dismissile agree with you. It would be interesting to see query which executes 14 min on such database – Sergey Berezovskiy May 08 '14 at 20:16
  • 1
    I'm sure marcgravell will chime in on the implications of speed and EF, but I'd look in to SPROC or batch executing. If you're updating in an iteration (calling SaveChanges each time) it's definitely going to add latency. – Brad Christie May 08 '14 at 20:17
  • 2
    As the other have stated, 10K is a very small database, but to answer your questions, for bulk operations, Stored Procedures will be faster and should be used. Take a look at this. http://stackoverflow.com/questions/2698151/entity-framework-vs-linq-to-sql-vs-ado-net-with-stored-procedures – mituw16 May 08 '14 at 20:17
  • How do I edit my original question to show one of my queries. – user2789697 May 08 '14 at 20:18
  • 1
    Click on the link that says `edit` right below the question tags. – Dismissile May 08 '14 at 20:20
  • 1
    I don't see anything inherently wrong with your query, but if the list you are processing is a large list, then by the nature of the way the code is written, it will be slow. LINQ and EF are excellent for small transactions, but for something like this where you are doing bulk processing, you would be much better off using a stored procedure. Or if you insist on C# code, even old school `SqlCommand` queries will execute faster – mituw16 May 08 '14 at 20:23
  • Why are you searching email contact in database before adding/removing it from campaign.EmailContacts? Can you simply remove contact which you already have? – Sergey Berezovskiy May 08 '14 at 20:32
  • In the code shown above; I'm not deleting the email contact, but I am deleting the email contact from the campaign. All of the email contacts are in one big list with a checkbox. If the checkbox is checked then it is added to the campaign. If checkbox is unchecked then the email contact is removed from the campaign. Since the same form can check or uncheck boxes, the code above either adds or deletes from the campaign. That's probably as clear as mud. – user2789697 May 08 '14 at 20:38
  • Someone said that I should'nt use ToList() because it loads everything into memory. What should I use instead? – user2789697 May 08 '14 at 20:51
  • Could it be that you are issuing too many queries? Maybe something like this would be faster? var list = assignEmailContacts.Where(a => a.Assigned).Select(a => a.EmailContactId); foreach (var ec in db.EmailContacts.Where(a => list.Contains(a.EmailContactId)) { campaign.EmailContacts.Add(ec); TotalAssigned += 1; } – J.H. May 08 '14 at 21:00
  • Can you try to break you problem down: `if checkbox is checked then it [?e-mail?] is added to the campaign`. What kind of datatype is campaign - does every addition to campaign involve db access? This line `campaign.EmailContacts.Remove(db.EmailContacts.Find(emailContact.EmailContactId));` seems to get an EmailContacts object from the database and then removes it from the database. So this would be two roundtrips for every emailContat. With more code and type information we can help better. – surfmuggle May 08 '14 at 21:09
  • Check out [EF Profiler](http://www.hibernatingrhinos.com/products/efprof) it is a very handy tool and can show you a real-time report on each SQL statement your code is running. You should be able to track down the bottlenecks with this. – StanK May 08 '14 at 21:24
  • Nothing is faster than a data reader. Fewer things are as feature full than EF. Neither appear to be your problem. – SQLMason May 08 '14 at 21:35
  • I don't see a call to `SaveChanges()`? Take a look with SQL Server Profiler to see which and how many queries are generated. – user247702 May 08 '14 at 22:27

3 Answers3

3

The slow bits about EF are making lots of round-trips to the database when you could have got all of the data you needed and; tracking object changes. With this in mind there are two major performance tools you have at your disposal in EF:

  1. The Include extension method.
  2. The AsNoTracking extension method (only useful for reads, but gives a big performance boost on mid-sized+ datasets, can conflict with Include).

The first method allows you to pull down all of your data in a single query (providing there is no inheritance in your object graph). The second stops EF doing all of its tracking stuff, which makes reads much faster.

What I usually do when something is looking slow is fire up SQL Profiler, run a trace on my database and then run the method that is taking time. Then I add the includes until I'm down to a minimal number of queries.

Also remember you can add indexes and the like.

One more thing. Your database might be rubbish! If you have a database on a badly configured server with lots of other active databases your performance woes could simply be due to disk contention. If the EF and index tuning has little effect take a look at the server: hardware really does matter.

satnhak
  • 9,407
  • 5
  • 63
  • 81
  • +1 because you didn't said EF is great for smaller DBs. I use EF daily on large DBs but sometimes you must fire up SQL Profiler, Glimpse etc. to understand your queries, execution plans. As all tools EF can be abused. You can also use bounded contexts http://msdn.microsoft.com/en-us/magazine/jj883952.aspx – Matija Grcic May 08 '14 at 21:01
  • Thanks for the tip; we have a fairly large context, although only about 10% of it gets used most of the time, will read that through tomorrow. – satnhak May 08 '14 at 22:39
1

Updating multiple database rows in EF is slow!

I assume there is a table called CampaignContacts containing the data from the n-to-n relationship of Campaign and Contacts. Luckily EF6 allows you to execute raw queries.

using System.Collections.Generic;
using System.Data.Entity;
using System.Data.SqlClient;
using System.Text;

public class YourDbContext : DbContext
{
    public void DeleteBatchCampaignContacts(IList<int> ids)
    {
        if (ids == null) return;
        if (ids.Count == 0) return;         

        //ONLY because the parameter comes from a list of Int, else you might risk injection
        Database.ExecuteSqlCommand(string.Format("DELETE FROM CampainContacts WHERE CampaignId in ({0})", string.Join(",", ids)));
    }

    public void UpdateBatchCampaignContacts(int campaignId, IList<int> ids)
    {
        if (ids == null) return;
        if (ids.Count == 0) return;

        Database.ExecuteSqlCommand(string.Format("UPDATE CampaignContacts SET CampaignId = @campaignId WHERE EmailContactId in ({0})", string.Join(",", ids),
            new SqlParameter("@campaignId", campaignId)));
    }
}

Notice the code duplication. You can refactor these functions, but this would be quicker than your current solution. You can add features like executing the query in batches of 50 IDs in order to handle thousands of IDs updates. This is a start.

Agent Shark
  • 517
  • 2
  • 6
  • This looks like it might work for me. I'm not familiar with these queries and I have a bit to learn. Just three things though: 1) Getting error on Database.ExecuteSqlCommand. I have imported system.data.entity namespace. It says, Im missing object reference. and 2) How would I change this to add records based on imported IDs. Would I just change the ".Remove" to ".Add"? Finally where can I learn more on this. Thanks for your help. – user2789697 May 09 '14 at 15:36
  • For adding a large volume of new records, use `SqlBulkCopy`. – Agent Shark May 09 '14 at 16:00
  • Looks like I did something incorrectly. When I implemented your suggested code for deleting, I get an error. "Conversion failed when converting the nvarchar value '39432,39431,39430,39429,39428' to data type int." Any suggestions? – user2789697 May 09 '14 at 18:22
  • I'm working on that as we speak. You have been such a big help. Thank You. – user2789697 May 11 '14 at 04:01
0

When dealing with records in a RDBM I usually go with stored procedures, you'll have the advantage that it will execute faster.

May be just may be you could speed up your queries by hashing your emails as ID's and do a search base on those parameters, but dependends on phase or stage of your project.

You also could query directly to your database and see how long it takes then you'll know if it's your code or not

lumee
  • 613
  • 1
  • 5
  • 15