0

Currently I am working on a project in which we read, process and store products. We are using Entity Framework 6 to read and write to a MySql database.

After building the prototype and fetching some statistics, we found that storing new products in the database takes (relatively) much time. I have been asked to improve this, but I can't really figure out what the best option is.

Currently, every read and write happens in a using block. Since this was my first time using Entity Framework 6, I did my research and the vast majority of StackOverflow said that you should always use a using block. So I did.

Code snippet of how it looks now;

public int GetSomeId(string SomeStringToMatchWith)
{
    using (var db = new MyDbContext())
    {
        return db.SomeTable.Where(t => t.SomeString == SomeStringToMatchWith).FirstOrDefault().id;
    }
}

public void SaveSomeData(int SomeId)
{
    using(var db = new MyDbContext())
    {
        db.SomeTable.Add(new SomeTable{ Id = SomeId });
        db.SaveChanges();
    }
}

I have been told that MySql would work faster if it is exposed to bulk data rather than to single data inserts. Also, after reading this question, I figured it might be better for the program to not immediately write the data to the database (and therefore, to not use using), but to create a simple Repository that saves data and write it to the database after a certain amount of time. Since I am willing to acccess the Repository through multiple threads, I figured a Singleton design would satisfy.

There is one nasty requirement though; the products have to be matched by certain values, and it could happen that product #2 has a match with product #1. In other words, I always need to be able to access the most recent data.

Something like this came to my mind;

public class Repository
{
    private static readonly object Lock = new object();

    private MyDbContext context { get; set; }

    private Repository()
    {        
        context = new MyDbContext();
    }

    private static Repository _Instance;
    public static Repository Instance
    {
        get
        {
            if (_Instance == null)
            {
                lock(Lock)
                {
                    if(_Instance == null)
                    {
                        _Instance = new Repository();
                    }
                }                    
            }
            return _Instance;
        }
    }

    //This method is called once in a while
    public void Commit()
    {
        context.SaveChanges();
        context.Dispose();                //Get rid of entities
        context = new MyDbContext();      //Create a fresh DbContext
    }   

    //Other Read/Write methods  
}

There are a few questions, actually;

  • Is it a hard task to make this whole class thread safe? Do I need to add a lock to every table to do so, or is there a smarter way?
  • Would it actually increase the performance? The linked question above does make me think it does.
  • Since this is not the recommended way and most of you will (probably) disagree; would there be a better way to implement this?

Please note that the current version works as-is. The program needs to process about 2.500.000 products and the only bottleneck seems to be the writing to the database. As a final note, I have also read Implementing the Repository and Unit of Work Patterns... which clearly tells me how to do it, but does not give me information why I should or should not use it.

Community
  • 1
  • 1
Dion V.
  • 2,090
  • 2
  • 14
  • 24
  • Bunch of good questions! First, you're on right track of creating repository as it can at least alleviate constant opening and closing of DB. I'm not too familiar with Locking at DB layer but, would recommend that as it's a common thing for a DB to lock tables or rows. Locking at application layer is ok too and you are on the right track. The only problem locking there is that it could be too broad. But I do have some very good news for you which I'll post a an answer as it can eliminate issues and give you best of EF And SQL worlds at same time. – JWP Nov 04 '14 at 15:09
  • 1
    *Since I am willing to acccess the Repository through multiple threads, I figured a Singleton design would satisfy.* There is nothing about a singleton that guarantees it to be thread-safe. You will have to explicitly make it so throughout your code just as you would if you'd be using an instance. – Yuval Itzchakov Nov 04 '14 at 15:10
  • Do also see [this](http://stackoverflow.com/questions/26095431/outofmemory-when-removing-rows-500000-entityframework-6) question for reference. As I understand it EF is not built for big bulk operations. – default Nov 04 '14 at 15:12
  • you can also play around with the [AutoDetectChanges](http://msdn.microsoft.com/sv-se/data/jj556205.aspx) to improve performance. It has other bottlenecks in that you would need to keep track of changes.. but if it's only new entities you are adding I think it should be ok. – default Nov 04 '14 at 15:20
  • The commit method above will require a subsequent open of the DBContext. Keep that in mind. – JWP Nov 04 '14 at 15:32
  • 2
    There is also a project on CodePlex for Bulk Inserts https://efbulkinsert.codeplex.com/ – JWP Nov 04 '14 at 15:33

1 Answers1

0

EF is very powerful and allows the designer of the application to determine how deep into it they want to go! What does this mean? It means you can either do everything through EF or when needed you can interface directly with SQL... Below are a few examples of this:

//The normal EF pattern is:
using(var db = new Myentities()){//do something here}

The method above favors strong typing all the way, it allows for perfect LINQ integration and produces very nicely (perfectly?) formed queries.

//However you can use EF to hit SQL directly.
using(var db= new MyEntiteis()){ var stuff = db.DataBase<passInStrongType>(query,parms)

The query parameter is a string, with parameter placeholders like this.

"Select * from Table where Field== @Parm1"  //works for inserts, updates and deletes too.

The second value is often an SQLParameter Array like this:

 var parms = new List<SqlParameter>()
 parms.Add(new SqlParameter { Name="Parm1", value = "myvalue" }).ToArray();

Now the beautify of this Hybrid solution is that you can get the absolute fastest SQL response because you are hitting SQL directly, but you pick up the ability to have results returned using Strongly-typed models.

Any type of query is allowed, Updates, Inserts, Deletes...

This hybrid approach get's you closest to SQL layer whild still staying within EF. Any collection returned still allows for LINQ. This to me is the best of two worlds.

JWP
  • 6,672
  • 3
  • 50
  • 74
  • *Is it a hard task to make this whole class thread safe? Do I need to add a lock to every table to do so, or is there a smarter way?* - I cannot find the answer to that question here.. – default Nov 04 '14 at 15:20
  • *Would it actually increase the performance?* I cannot find the answer to that question here. – default Nov 04 '14 at 15:21
  • The question is about adding to a database through EntityFramework. What does an sql select query have to do with that? I fail to see how this answer would help the OP. – default Nov 04 '14 at 15:23
  • Some will say yes, others will say "it depends" First off query performance is a function of how the query is written. There's a lot of folks who know the EF side on writing well performing queries, however, there's a lot more DBAs who understand query performance using tools like SSMS. Bottom line is that the closer you get to the "metal" the faster things go. If speed is primary importance then interfacing directly to SQL is good. – JWP Nov 04 '14 at 15:24
  • I'm sorry, but I also fail to see this as an answer... Perhaps you misinterpreted my question? – Dion V. Nov 04 '14 at 15:24
  • Thread safety is a topic unto it's own. There's a ton of articles out there, but as I mentioned before, pushing Record locking onto the DB is good because it does or is supposed to handle this all the time. I know that there is at least two levels of locking Table and Row, but I think it can get much more specific than that. – JWP Nov 04 '14 at 15:26
  • In response to this question: "After building the prototype and fetching some statistics, we found that storing new products in the database takes (relatively) much time. I have been asked to improve this, but I can't really figure out what the best option is." I showed you the fastest way to integrate EF with SQL. I also agreed that the repository solution will alleviate connection lags. – JWP Nov 04 '14 at 15:28
  • @DionV. Your focus on this question was client side, thus your path in the respository model. Making things faster because most queries may already be in memory. All I was showing you was how to bypass all EF layers to go directly to the server side. DBAs are very good at fine tuning their servers and are willing to analyze queries for best results. I'm convinced that most programmers don't know much about writing well performing queries and LINQ doesn't help. – JWP Apr 07 '15 at 17:24