2

I am creating a WEB API that is receiving a stringified JSON and I am doing a JSON convert and before I add it to the SQL database I give two of the properties values using a for loop but it takes more then 5 mins before it finally hits the db.SaveChanges(). The JSON has around 23,908 objects. I wanted to ask the community if there was a better implementation then what I am doing to speed up the adding of each object?

//This handles the stringify conversion and the adding values to the properties before adding it to SQL database 
public void SaveCSV(string file, string fileName)
{
  var csv = JsonConvert.DeserializeObject<List<SecurityFile>>(file);
  using (ApplicationDbContext db = ApplicationDbContext.Create())
  {
   //For loop that adds the values to each object in JSON
   for (var i = 0; i < csv.Count(); i++)
   {
      csv[i].DateSubmitted = DateTime.Now;
      csv[i].FileName = fileName;
      db.SecurityFiles.Add(csv[i]);
   }
     //Saves it to SQL Database
     db.SaveChanges();
   }

//Here is my Class
public class SecurityFile
{
  [Key]
  public int ID { get; set; }
  [JsonProperty("Plugin ID")]
  public string PluginId { get; set; }
  [JsonProperty("CVE")]
  public string CVE { get; set; }
  [JsonProperty("Risk")]
  public string Risk { get; set; }
  [JsonProperty("Host")]
  public string Host { get; set; }
  [JsonProperty("Protocol")]
  public string Protocol { get; set; }
  [JsonProperty("Port")]
  public string Port { get; set; }
  [JsonProperty("Name")]
  public string Name { get; set; }
  [JsonProperty("Synopsis")]
  public string Synopsis { get; set; }
  [JsonProperty("Description")]
  public string Description { get; set; }
  [JsonProperty("Solution")]
  public string Solution { get; set; }
  [JsonProperty("See Also")]
  public string SeeAlso { get; set; }
  [JsonProperty("FileName")]
  public string FileName{ get; set; }
  [JsonProperty("DateSubmitted")]
  public  DateTime DateSubmitted { get; set; }
}
esiprogrammer
  • 1,438
  • 1
  • 17
  • 22
Rethabile
  • 325
  • 3
  • 22
  • what is slow var csv = JsonConvert... or the for loop? (btw why don't you use foreach?) – gsharp Mar 23 '16 at 12:16
  • It's not the JsonConvert that is slow its the for loop that is taking up all the time. Does the foreach have a performance advantage over the for loop? – Rethabile Mar 23 '16 at 12:18
  • Have a look at http://stackoverflow.com/questions/5940225/fastest-way-of-inserting-in-entity-framework/5940267#5940267, your best option to probably not use EF for this. – Matt McCabe Mar 23 '16 at 12:28

3 Answers3

4

There are several optimization options. I think you will profit if you parallelize the process using C# Linq TPL and then add the entities all at once.

using System.Threading.Tasks;

public void SaveCSV(string file, string fileName)
{
    var csv = JsonConvert.DeserializeObject<List<SecurityFile>>(file);
    using (ApplicationDbContext db = ApplicationDbContext.Create())
    {
        var now = DateTime.Now;

        Parallel.Foreach(csv, item => {
            item.DateSubmitted = now;
            item.FileName = fileName;
        })

        //Attach the Entities all at once
        db.SecurityFiles.AddRange(csv);

        //Saves it to SQL Database
        db.SaveChanges();
    }
}
KarmaEDV
  • 1,631
  • 16
  • 27
  • WOW!!! It went from taking 5mins to taking 5 seconds. Thank you for the response and for providing the documentation as well. – Rethabile Mar 23 '16 at 12:30
  • One further optimization if you are OK with the objects all having the same Timestamp, only read DateTime.Now once. – KarmaEDV Mar 23 '16 at 12:37
  • @Rethabile: And processor utilization spikes to 100% ? Nice if it works, but be very careful with this... – Stefan Steiger Mar 23 '16 at 12:37
  • Do try adding bulk-insert aswell, and it'll get even faster. There's some sample bulk-insert code here: http://www.mikesknowledgebase.com/pages/LINQ/InsertAndDeletes.htm – Mike Gledhill Mar 23 '16 at 12:48
  • 1
    @KarmaEDV i don't think the speed comes from Parallel.Foreach but rather from db.SecurityFiles.AddRange(csv) Rethabile can you try to use your orginal sample as it is and remove the .Add part from the for and add .AddRange before .SaveChanges()? Just for the sake of curiosity :) – gsharp Mar 23 '16 at 12:49
  • @gsharp yes, I agree. The speed comes from the AddRange for sure. Still TPL is the still most underrated feature in the Framework and everybody should know about this :-) I also agree on the BulkInsert-Extension! – KarmaEDV Mar 23 '16 at 12:52
  • @StefanSteiger Somebody once said, that any CPU that is not constantly at 100% is a waste of ressources. He also thought that nobody would ever need more than 64KB of memory but that'd be off-topic. – KarmaEDV Mar 23 '16 at 12:56
  • @KarmaEDV: Yea, well the thing is, as long as you have 1 concurrent user, you can do such things - and pretty much anything; but otherwise, I wouldn't recommend living by that motto. – Stefan Steiger Mar 26 '16 at 02:19
1

20'000 objects isn't very much.
You should be able to do it in less than five seconds 700ms.

Your way is very slow because you do

foreach(jsonObject)
   connection.open()
   cmd.execute() // Individual insert-statement
   connection.close()

Opening and Closing a connection is expensive, as is executing each command separately.

You should instead do

int batchSize = 100;
System.Text.StringBuilder sb = new System.Text.StringBuilder();
connection.Open();
transaction.Begin();
for(int i = 0; i < Count;++i)
{ 
     sb.Append("SQL-Insert");
     if(i%batchSize == 0 && i != 0)
     {
          execute(sb.ToString())
          sb.Length = 0;
     }
}
execute(sb.ToString())
transaction.commit();
// TODO: Try/Catch + Rollback
connection.Close();

A much easier way is to fetch the table you want to insert your objects to with SELECT * FROM table_name WHERE (1=2) then add the entries to the datatable, and then call DataAdaper.Update()
If your table has a primary-key, you can even auto-generate the insert command using SqlCommandBuilder.

C#

public static void InsertUpdateDataTable(string strTableName, System.Data.DataTable dt)
{
    string strSQL = string.Format("SELECT * FROM [{0}] WHERE 1 = 2 ", strTableName.Replace("]", "]]"));

    using (System.Data.SqlClient.SqlDataAdapter daInsertUpdate = new System.Data.SqlClient.SqlDataAdapter(strSQL, getConnectionString())) {
        SqlCommandBuilder cmdBuilder = new SqlCommandBuilder(daInsertUpdate);
        daInsertUpdate.InsertCommand = cmdBuilder.GetInsertCommand();
        daInsertUpdate.UpdateCommand = cmdBuilder.GetUpdateCommand();

        daInsertUpdate.Update(dt);
    }

}

VB.NET:

Public Shared Sub InsertUpdateDataTable(strTableName As String, dt As System.Data.DataTable)
    Dim strSQL As String = String.Format("SELECT * FROM [{0}] WHERE 1 = 2 ", strTableName.Replace("]", "]]"))

    Using daInsertUpdate As New System.Data.SqlClient.SqlDataAdapter(strSQL, getConnectionString())
        Dim cmdBuilder As New SqlCommandBuilder(daInsertUpdate)
        daInsertUpdate.InsertCommand = cmdBuilder.GetInsertCommand()
        daInsertUpdate.UpdateCommand = cmdBuilder.GetUpdateCommand()

        daInsertUpdate.Update(dt)
    End Using

End Sub

You can set the batch-size on the DataAdaper, and it generates the SQL for you as well.

And in a database-agnostic way:

private static System.Data.Common.DbProviderFactory m_factory = System.Data.Common.DbProviderFactories.GetFactory(typeof(System.Data.SqlClient.SqlClientFactory).Namespace);


public static void InsertUpdateDataTable(string strTableName, System.Data.DataTable dt)
{
    if (dt == null)
        throw new System.ArgumentNullException("DataTable dt may not be NULL.");

    // https://msdn.microsoft.com/en-us/library/aadf8fk2(v=vs.110).aspx
    using (System.Data.Common.DbDataAdapter daInsertUpdate = m_factory.CreateDataAdapter())
    {

        using (System.Data.Common.DbConnection conn = m_factory.CreateConnection())
        {
            conn.ConnectionString = getConnectionString();

            daInsertUpdate.SelectCommand = conn.CreateCommand();
            daInsertUpdate.SelectCommand.CommandText = string.Format("SELECT * FROM [{0}] WHERE 1 = 2 ", strTableName.Replace("]", "]]"));

            using (System.Data.Common.DbCommandBuilder cmdBuilder = m_factory.CreateCommandBuilder())
            {
                cmdBuilder.DataAdapter = daInsertUpdate;
                daInsertUpdate.InsertCommand = cmdBuilder.GetInsertCommand();
                daInsertUpdate.UpdateCommand = cmdBuilder.GetUpdateCommand();
            } // End Using cmdBuilder

            daInsertUpdate.Update(dt);
        } // End Using conn
    } // End Using daInsertUpdate 

    System.Console.WriteLine(dt);
}

And using a transaction (assuming ls is a list of numbers):

private static System.Data.Common.DbProviderFactory m_factory = System.Data.Common.DbProviderFactories.GetFactory(typeof(System.Data.SqlClient.SqlClientFactory).Namespace);


public static string getConnectionString()
{
    System.Data.SqlClient.SqlConnectionStringBuilder csb = new System.Data.SqlClient.SqlConnectionStringBuilder();

    csb.DataSource = System.Environment.MachineName;
    csb.InitialCatalog = "TestDb";
    csb.IntegratedSecurity = true;

    return csb.ConnectionString;
}


public static System.Data.Common.DbConnection GetConnection()
{
    var con = m_factory.CreateConnection();
    con.ConnectionString = getConnectionString();

    return con;
}


public static int BatchedInsert(System.Collections.IList ls)
{
    int iAffected = 0;
    int batchSize = 100; // Each batch corresponds to a single round-trip to the DB.

    using (System.Data.IDbConnection idbConn = GetConnection())
    {

        lock (idbConn)
        {

            using (System.Data.IDbCommand cmd = idbConn.CreateCommand())
            {

                lock (cmd)
                {
                    if (cmd.Connection.State != System.Data.ConnectionState.Open)
                        cmd.Connection.Open();

                    using (System.Data.IDbTransaction idbtTrans = idbConn.BeginTransaction())
                    {

                        try
                        {
                            cmd.Transaction = idbtTrans;


                            System.Text.StringBuilder sb = new System.Text.StringBuilder();

                            for (int i = 0; i < ls.Count; ++i)
                            {
                                sb.Append("INSERT INTO T_TransactionInsertTest(TestValue) VALUES ( ");
                                sb.Append(ls[i].ToString());
                                sb.AppendLine(");");

                                if (i % batchSize == 0 && i != 0)
                                {
                                    cmd.CommandText = sb.ToString();
                                    iAffected += cmd.ExecuteNonQuery();
                                    sb.Length = 0;
                                }
                            }

                            if (sb.Length != 0)
                            {
                                cmd.CommandText = sb.ToString();
                                iAffected += cmd.ExecuteNonQuery();
                            }

                            idbtTrans.Commit();
                        } // End Try
                        catch (System.Data.Common.DbException ex)
                        {
                            if (idbtTrans != null)
                                idbtTrans.Rollback();

                            iAffected = -1;

                            //if (Log(ex))
                            throw;
                        } // End catch
                        finally
                        {
                            if (cmd.Connection.State != System.Data.ConnectionState.Closed)
                                cmd.Connection.Close();
                        } // End Finally

                    } // End Using idbtTrans

                } // End lock cmd

            } // End Using cmd 

        } // End lock idbConn

    } // End Using idbConn

    return iAffected;
} // End Function BatchedInsert 
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
  • 1
    While your answer is very comprehensive, I think that if the OP asks for a way to improve his EF-Code you shouldn't answer with an Ado.Net approach. jm2c – KarmaEDV Mar 23 '16 at 12:40
  • @KarmaEDV he was asking for a better implementation, that doesn't limit it to EF. What if the number of objects is 10-100 times bigger next time. – Matt McCabe Mar 23 '16 at 12:45
  • I think `BULK INSERT` is best option to insert large number of rows – esiprogrammer Mar 23 '16 at 12:50
0

I would recommend to Insert all rows with one SQL Query rather than insert each row in the loop.

you can insert all items tempTable and than make an insert join statement and execute it outside of the loop.

take a look at this question bulk-record-update-with-sql

or either you can execute a bulk insert statement:

using (ApplicationDbContext db = ApplicationDbContext.Create())
{
   db.Database.ExecuteSqlCommand(@" BULK
    INSERT SecurityFiles
    FROM 'your file path'
    WITH
    (
    FIRSTROW
    = 1,
    FIELDTERMINATOR = ';',
    ROWTERMINATOR = '\n',
    )");
}
Community
  • 1
  • 1
esiprogrammer
  • 1,438
  • 1
  • 17
  • 22
  • This would sort that, the reason it's slow is because row by row is slow, bulk insert is much faster. – Matt McCabe Mar 23 '16 at 12:15
  • I'm no EF expert, but the data is written to the db only when SaveChanges is called? Does adding an Item to a DbSet takes so much time @MattMcCabe ? – gsharp Mar 23 '16 at 12:21
  • Sorry @gsharp I missed the EF angle (I don't use it), i don't think EF is capable of bulk insert out of the box. This answer is problem the right question to be asking http://stackoverflow.com/questions/5940225/fastest-way-of-inserting-in-entity-framework/5940267#5940267 – Matt McCabe Mar 23 '16 at 12:27