0

Before this gets marked as duplicate I have seen many answers like this one Convert IEnumerable to DataTable and have attempted doing something similar in the way of creating an extension method. I ask my question as the issue I'm having may lie somewhere else.

Essentially I have quite a large IEnumerable<T> (around 16 - 17 million items) up to this point I have not really had any issues with this, until I tried converting this to a DataTable using an extension method:

/// <summary>
/// Converts IEnumerable to datatable. Mainly for use when using SQLBulkCopy/>
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="collection"></param>
/// <param name="customColumnOrder">Custom order for columns allows me to make sure that the order of columns will always be the same. Am open for suggestions for better ways to do this</param>
/// <returns></returns>
public static DataTable ToDataTable<T>(this IEnumerable<T> collection, List<Tuple<string, int, int>> customColumnOrder)
{
    DataTable dt = new DataTable();
    var type = collection.First().GetType();

    foreach (var column in customColumnOrder)
    {
        dt.Columns.Add(column.Item1, Nullable.GetUnderlyingType(type.GetProperty(column.Item1).PropertyType) ?? type.GetProperty(column.Item1).PropertyType);
    }

    // Populate the table
    foreach (T item in collection)
    {
        DataRow dr = dt.NewRow();
        dr.BeginEdit();

        foreach (var column in customColumnOrder)
        {
            dr[column.Item1] = type.GetProperty(column.Item1).GetValue(item) ?? DBNull.Value;
        }

        dr.EndEdit();
        dt.Rows.Add(dr);
    }

    return dt;
}

This works fine for smaller tables of around 100,000 items, but begins to really struggle when it gets into the millions. I just keep getting time outs. Is there a more efficient/generally better way to do convert from a IEnumerable<T> to a DataTable?

I am converting to DataTable so I can use SqlBulkCopy to get the data into a database.

EDIT: here's where the data is being passed through from

    /// <summary>
    /// SqlBulkCopy for saving large amounts of data
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="dataToSave"></param>
    /// <param name="modelManager">Custom manager to use alongside the model</param>
    /// <param name="conn">Connection string to DB</param>
    public void BatchSave<T>(IEnumerable<T> dataToSave, IData modelManager, string conn)
    {
        var model = dataToSave.First();

        using (SqlConnection sqlconn= new SqlConnection(conn))
        {
            sqlconn.Open();

            using (SqlCommand cmd = new SqlCommand(GetCreateScript(modelManager, model), sqlconn))
            {
                //Create temp table to do initial insert into
                cmd.ExecuteNonQuery();

                SqlBulkCopy copy = new SqlBulkCopy(cmd.Connection);

                copy.DestinationTableName = "#tempTableForImport";

                // Convert data to DataTable
                DataTable dt = dataToSave.ToDataTable(modelManager.GetDataColumnsOrder());

                // Copy to temp table
                copy.WriteToServer(dt);
            }

            using (SqlCommand cmd = new SqlCommand(modelManager.GetInsertSproc(), sqlconn) { CommandType=CommandType.StoredProcedure })
            {
                // Clean up data and move to final table
                cmd.ExecuteNonQuery();
            }

            sqlconn.Close();
        }
    }

EDIT #1: newly amended code using a suggestion made, its now using Fastmember:

public void BatchSave<T>(IEnumerable<T> dataToSave, IData modelManager, string conn)
{
    var model = dataToSave.First();

    using (SqlConnection sqlconn = new SqlConnection(conn))
    {
        sqlconn.Open();

        using (var bcp = new SqlBulkCopy(sqlconn))
        {
            using (var reader = ObjectReader.Create(dataToSave, modelManager.GetDataColumnsOrder().Select(s => s.Item1).ToArray() /*modelManager.GetDataColumnsOrder().Select(obj=>obj.Item1).ToString()*/))
            {
                using (SqlCommand cmd = new SqlCommand(GetCreateScript(modelManager, model), sqlconn))
                {
                    cmd.ExecuteNonQuery();
                    bcp.DestinationTableName = "#tempTableForImport";
                    bcp.WriteToServer(reader);
                }

                using (SqlCommand cmd = new SqlCommand(modelManager.GetInsertSproc(), sqlconn) { CommandType = CommandType.StoredProcedure })
                {
                    cmd.ExecuteNonQuery();
                }
            }
        }

        sqlconn.Close();
    }
}

This has sped things up, however I am still getting a "Timeout Expired" on this line bcp.WriteToServer(reader);.

Thanks all for the help so far after about 30 seconds, any more ideas on this? Maybe someway to increase length of time before timeout?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Srb1313711
  • 2,017
  • 5
  • 24
  • 35

4 Answers4

5

Instead of going through a DataTable, I would implement an IDataReader for your collection and feed that to SqlBulkCopy. If done correctly, and using a lazy IEnumerable, it would be much faster, and use much less memory than the datatable route. Mark Gravell has already written such a library for converting IEnumerables to an IDataReader, and I would recommend you check that out before rolling your own.

FastMember can be found on NuGet here: https://www.nuget.org/packages/FastMember/ with the original source found here: https://code.google.com/p/fast-member/ with an example in this thread here: SqlBulkCopy from a List<>

UPDATE: You may also need to change your command timeout, and set the batch size on the sqlbulkcopy, like this:

using (SqlCommand cmd = new SqlCommand(modelManager.GetInsertSproc(), sqlconn) { 
  CommandType = CommandType.StoredProcedure, CommandTimeout=300 })

and

bcp.BatchSize = 100000;
Community
  • 1
  • 1
Robert McKee
  • 21,305
  • 1
  • 43
  • 57
  • This is a different question, as it is likely the command you are running after the bulk insert that is timing out. Just add `cmd.CommandTimeout=300;` before `cmd.ExecuteNonQuery()` – Robert McKee Oct 02 '15 at 21:18
  • Or you might be able to change `CommandType = CommandType.StoredProcedure` to `CommandType = CommandType.StoredProcedure, CommandTimeout=300` in your using statement as well. 300 is just an example, but that is a 5 minute timeout. – Robert McKee Oct 02 '15 at 21:19
  • Thank you ill try these, however when debugging its on the WriteToServer where the timeout occurs – Srb1313711 Oct 02 '15 at 21:23
  • As far as answering my original qeustion this was the best answer. However to see the final resolution to my problem see my answer – Srb1313711 Oct 05 '15 at 08:57
  • Added your changes to my answer so that it is complete. Hopefully it helps other people in the future! – Robert McKee Oct 05 '15 at 16:17
1

With performance issues its kind of hard to provide specific fixes, but I would remove anything you dont actually need, starting with the the BeginEdit() and EndEdit() calls. You are creating a new row, unless you need an explicit rowstate for something not depicted in your question, then these are doing extra stuff you probably dont need.

Another thing that could be tried is to split your collection into chunks, then use a Parallel.For/Foreach to perform your data table creation for each chunk, then use DataTable.Merge() to merge them back together and return the result.

StingyJack
  • 19,041
  • 10
  • 63
  • 122
1

Don't convert. The DataTalble is timing out and is a memory hog. You can use TVP (Table Value Parameter) to load very fast. It is like a reverse data reader. For coming from IEnumable (not a DataTable) use SqlDataRecord.
just one link - search on TVP SqlDataRecord

paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • 1
    Come on down vote what is the problem? I use this to load millions and millions of records very fast. – paparazzo Oct 02 '15 at 19:44
0

Essentially I have quite a large IEnumerable (around 16 - 17 mil items) up to this point i have not really had any issues with this, until i tried converting to a datatable using an extension method:

As per the documentation, the upper limit on rows in a DataTable is 16,777,216

The maximum number of rows that a DataTable can store is 16,777,216. For more information, see Adding Data to a DataTable.

rism
  • 11,932
  • 16
  • 76
  • 116