0

I have an MVC5 application, I'm using EF6 and MySQL database service from Azure, which is CleanDB MySQL. I have a CSV reader class, which works pretty fast in my console app. Then, in my MVC app, I have a model like this:

Provider.cs

public class Provider
{
    public int ProviderId { get; set; }

    public string ProviderName { get; set; }

    public string Email { get; set; }

    public string Address { get; set; }

    public string City { get; set; }

    [ForeignKey("State")]
    public int StateID { get; set; }

    public virtual State State { get; set; }

    public string Zip { get; set; }

    public string Phone { get; set; }

    public string Fax { get; set; }

    public string SICCode { get; set; }

    public string Description { get; set; }

    public string Website { get; set; }

    public string RefId { get; set; }
}

Then, on my Configuration.cs file, I have something like this:

string[] csvs = Directory.GetFiles(@"C:\Users\User\Downloads\db\test\");

foreach (string csv in csvs)
{
    using (CsvReader reader = new CsvReader(csv))
    {
        foreach (string[] values in reader.RowEnumerator)
        {
            var provider = new Provider();
            provider.ProviderName = values[0];
            provider.Email = values[1];
            provider.Address = values[2];
            provider.City = values[3];
            provider.StateID = 491;
            provider.Zip = values[5];
            provider.Phone = values[6];
            provider.Fax = values[7];
            provider.SICCode = values[8];
            provider.Description = values[9];
            provider.Website = values[10];
            provider.RefId = "";

            context.Providers.Add(provider);
        }
    }
}

context.SaveChanges();

The point is that I have about 50 CSV files, with a total size of 400MB, and total entries around 9 million. Just for testing purposes I used the smallest CSV file in my list, which was a 2MB file, with around 15k entries. Executing Update-Database took around 40 minutes to put all those entries into the database. You can assume how much time I will need for 9 million entries. Any idea how to speed-up the process?

tett
  • 595
  • 3
  • 13
  • 34
  • 1
    You should try a different approach. Probably EF is not the best tool for massive loading of data. [Look here](http://stackoverflow.com/questions/2821725/is-that-possible-to-do-bulk-copy-in-mysql) and [here](http://stackoverflow.com/questions/4883376/how-to-bulk-insert-into-mysql-using-c-sharp) – Steve Feb 18 '15 at 17:31
  • Where does your instrumentation show the time is being spent? IE, I assume your logging solution has timestamps applied, so based on those timestamps what are the most expensive operations you're performing? When it comes to performance issues, measure measure measure! – Greg D Feb 18 '15 at 18:53
  • Steve got it right. EF is not meant for bulk-operations (massive import/export). Each DB has it's own feature for that. MSSQL, for instance, uses BCP. – Anderson Matos Feb 18 '15 at 21:15

1 Answers1

0

Are you sure it's actually your database performance that's the problem here? By not calling SaveChanges until the end, Entity Framework is batching everything in a single transaction, so aside from transport time, you just have the standard cost of simply adding 9 million entries to any database table. There's not a whole lot you can do about that, aside from giving more resources to your database server, which may or may not have an impact depending on how resource-constrained it currently is. Even in the worst scenarios, your transport time for the actual query shouldn't exceed a second unless you just have the world's worst connection or are trying to do this over dialup or something.

Likely, your biggest performance hit is in actually hitting the filesystem and reading from the CSV files. I'd look hard at your CSVReader and see if perhaps you can find a more performant replacement. Also, I'm not sure how this library functions but if it's streaming the file from the filesystem, it would probably be better to just read it all into memory at once (assuming you have enough RAM). Accessing many small clusters on a drive is typically slower than reading a single large cluster. A very rough analogy could be copying files to a USB drive (since that's something pretty much all of us have experience with). I'm sure you've noticed it takes much longer to copy 4000 1KB files than it does to copy one 4 MB file, and only scales from there.

Chris Pratt
  • 232,153
  • 36
  • 385
  • 444
  • I tried creating a console app, where I had `List all = new List();` and putting all the columns values of all the rows from my smallest CSV file (`all.Add(values[i]);`), and it finished in about 1 second using my CSV reader, and when I did something like `all.Count`, there were about 160k entries in the list, which is correct I guess. It seems like my CSV reader does a good job while reading. – tett Feb 18 '15 at 18:07
  • Not reading from memory. Anything will perform well like that. I'm talking about reading from the filesystem. – Chris Pratt Feb 18 '15 at 18:11
  • I meant, my code was exactly like the above code where I read from the CSV file in my computer, except, instead of creating new Provider and assigning values to it, I created a single `List`, and assigned the values there. Any code that may help me? I cannot do `MySqlBulkLoad`, while as long as I know CleanDB MySQL does not allow things like this due to security concerns. – tett Feb 18 '15 at 18:35