12

I have a data table which has 1123 records. I want to split this table into 5 fixed size separate datatables. Size limit for each table is 225.

So size of resulting datatables will be:

DT1 : 225 rows
DT2 : 225 rows
DT3 : 225 rows
DT4 : 225 rows
DT5 : 223 rows (remaining rows)

I was able to find how to split datatable based on the column value using LINQ here.

I also found a way to split datatable into multiple tables here. Wanted to know if there's a better way of doing this. Posting code form the link:

private static List<DataTable> SplitTable(DataTable originalTable, int batchSize)
{
     List<DataTable> tables = new List<DataTable>();
     int i = 0;
     int j = 1;
    DataTable newDt = originalTable.Clone();
   newDt.TableName = "Table_" + j;
   newDt.Clear();
    foreach (DataRow row in originalTable.Rows)
    {
         DataRow newRow = newDt.NewRow();
         newRow.ItemArray = row.ItemArray;
         newDt.Rows.Add(newRow);
         i++;
         if (i == batchSize)
        {
           tables.Add(newDt);
           j++;
          newDt = originalTable.Clone();
          newDt.TableName = "Table_" + j;
          newDt.Clear();
          i = 0;
      }
  }
   return tables;
}

Need help in splitting datatable into fixed size.

Community
  • 1
  • 1
Nilesh Barai
  • 1,312
  • 8
  • 22
  • 48
  • 1
    What's wrong with the current way that you are doing it..? is it yielding the 5 tables with 225 records each in them..? – MethodMan Jan 07 '16 at 19:59
  • it working well. I am worried about the efficiency. The actual data on which this function will be operating will be large. Also its always better to code in a small set of lines. – Nilesh Barai Jan 07 '16 at 20:08
  • Actually, this question is more appropriate at [CodeReview](http://codereview.stackexchange.com/). – Gert Arnold Jan 10 '16 at 22:12
  • @GertArnold It doesn't sound like the author's code is working correctly yet ("Need help in splitting datatable into fixed size") which would mean it is off-topic on Code Review, until it works as the author intends. If you have any questions or concerns, join us at our [CR Help Desk](http://chat.stackexchange.com/rooms/34045). – Phrancis Jan 10 '16 at 22:16
  • Ah, okay. Then if it is working well but they want to improve efficiency, this would be OK for Code Review. – Phrancis Jan 10 '16 at 22:18
  • you can check this one https://www.codeproject.com/Tips/4057734/Split-DataTable-into-Multiple-DataTables-Csharp – Dipon Roy Jul 31 '19 at 06:34

5 Answers5

22

I once made this little extension method:

public static IEnumerable<IEnumerable<T>> ToChunks<T>(this IEnumerable<T> enumerable,
                                                      int chunkSize)
{
    int itemsReturned = 0;
    var list = enumerable.ToList(); // Prevent multiple execution of IEnumerable.
    int count = list.Count;
    while (itemsReturned < count)
    {
        int currentChunkSize = Math.Min(chunkSize, count - itemsReturned);
        yield return list.GetRange(itemsReturned, currentChunkSize);
        itemsReturned += currentChunkSize;
    }
}

that cuts any IEnumerable into chunks of the specified chunk size.

Having this, you can simply do:

var tables = originalTable.AsEnumerable().ToChunks(225)
                          .Select(rows => rows.CopyToDataTable())

The reason why this could perform better than a straightforward foreach is that list.GetRange is a very efficient method to get a range of rows from a list. I curious to know what you'll find out.

Community
  • 1
  • 1
Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
  • @GertArnold How can I make a vb.net version for this, as vb does not support yield? – Smith Apr 10 '17 at 02:04
  • I am using VS 2008 – Smith Apr 10 '17 at 10:48
  • How can I loop through it to get column data using `foreach(DataRow in tables)` ? – Moeez Apr 16 '19 at 04:41
  • Outstanding solution! Thanks – Mad Dog Tannen Oct 29 '19 at 07:59
  • @GertArnold i use your extension method for split datatable with 5 million rows but i get out of memory exception, i can help me to avoid this exception, out of memory at `System.Data.Common.StringStorage.SetCapacity(Int32 capacity)` – Hasan Fathi Mar 26 '20 at 05:42
  • @HasanFathi Not from a single comment. You should post this in a new question. Or try a streaming solution with Skip/Take, which will take considerably longer but won't eat that much memory. – Gert Arnold Mar 26 '20 at 09:42
  • A far better method with lazy evaluation, possible by more modern C# syntax, is [here](https://stackoverflow.com/a/68902864/861716). – Gert Arnold Aug 30 '21 at 18:27
2
private static List<DataTable> SplitTable(DataTable originalTable, int batchSize)
    {
        List<DataTable> tables = new List<DataTable>();
        int i = 0;
        int j = 1;
        DataTable newDt = originalTable.Clone();
        newDt.TableName = "Table_" + j;
        newDt.Clear();
        foreach (DataRow row in originalTable.Rows)
        {
            DataRow newRow = newDt.NewRow();
            newRow.ItemArray = row.ItemArray;
            newDt.Rows.Add(newRow);
            i++;
            if (i == batchSize)
            {
                tables.Add(newDt);
                j++;
                newDt = originalTable.Clone();
                newDt.TableName = "Table_" + j;
                newDt.Clear();
                i = 0;
            }



        }
        if (newDt.Rows.Count > 0)
        {
            tables.Add(newDt);
            j++;
            newDt = originalTable.Clone();
            newDt.TableName = "Table_" + j;
            newDt.Clear();

        }
        return tables;
    }



 foreach (var dt1 in SplitTable(table1, 2))
        {
            DataTable dt = dt1;
        }
Shridhar
  • 2,258
  • 2
  • 12
  • 13
0

Another way of doing it for lazy people :)

private static DataTable GetDataTable<T>(IEnumerable<T> data, int skip, int take)
        {
            var properties = TypeDescriptor.GetProperties(typeof(T));

            var dataTable = new DataTable();
            foreach (PropertyDescriptor prop in properties)
                dataTable
                    .Columns
                    .Add(prop.Name, Nullable.GetUnderlyingType(prop.PropertyType)
                                    ?? prop.PropertyType);

            foreach (var item in data.Skip(skip).Take(take))
            {
                var row = dataTable.NewRow();
                foreach (PropertyDescriptor prop in properties)
                    row[prop.Name] = prop.GetValue(item) ?? DBNull.Value;

                dataTable.Rows.Add(row);
            }
            return dataTable;
        }

And client will call it like this:

 var nthDataTable = GetDataTable(model, skip: n, take: m);
Gabriel Marius Popescu
  • 2,016
  • 2
  • 20
  • 22
  • I want to split it and then take each 1000 record can you please see my [question](https://stackoverflow.com/questions/55688229/how-to-split-data-table-into-multiple-tables-with-adding-1-minute-delay)? – Moeez Apr 16 '19 at 04:43
0

Maybe i'm too late, but the best way i found was this, setting my datatable into List of DataRows

    public static List<DataTable> Testes(DataTable yourDataTableToSplit, int numberOfDatatablesYouWant) 
    {
        List<DataRow> rows = yourDataTableToSplit.AsEnumerable().ToList();
        List<DataTable> result = new List<DataTable>();
        int rowsToTake = yourDataTableToSplit.Rows.Count / numberOfDatatablesYouWant;
        while (rows.Count > 0)
        {
            result.Add(rows.Take(rowsToTake).CopyToDataTable());
            rows = rows.Skip(rowsToTake).ToList();
        }
        return result;
    }

i used it to bulk insert data on sqlserver, like this:

        public static List<string> DataBulkInsert(DataTable dtDataInsert, SqlConnection conn, string tableName)
        {
            List<string> errors = new List<string>();
            int batchSize = dtDataInsert.Rows.Count;

            bool lastEntry= false;

            List<DataRow> bulkData = dtDataInsert.AsEnumerable().ToList();

            while (bulkData.Count() > 0 && batchSize > 0)
            {
                bulkData = SQLBulkInsert(bulkData, conn, dtDataInsert.TableName, batchSize , ref errors);
                if (batchSize % 2 == 0)
                    batchSize /= 2;
                else if (batchSize > 1)
                    batchSize = (batchSize + 1) / 2;
                else if (batchSize == 1 && !lastEntry)
                    lastEntry= true;
                else
                    batchSize = 0;
            }

            return errors.GroupBy(x => x).Select(x => x.FirstOrDefault()).ToList();
        }

        public static List<DataRow> SQLBulkInsert(List<DataRow> listToInsert, SqlConnection conn, string tableName, int batchSize, ref List<string> erros)
        {

            List<DataRow> dataError = new List<DataRow>();
            List<DataRow> dataToTry = listToInsert.Take(batchSize).ToList();

            while (dataToTry.Count() > 0)
            {
                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(conn))
                {
                    bulkCopy.BulkCopyTimeout = 120;
                    bulkCopy.DestinationTableName = tableName;
                    try
                    {
                        bulkCopy.WriteToServer(dataToTry.CopyToDataTable());
                    }
                    catch (Exception ex)
                    {
                        errors.Add(ex.Message);
                        dataError.AddRange(dataToTry);
                    }

                    listToInsert = listToInsert.Skip(batchSize).ToList();
                    dataToTry = listToInsert.Take(batchSize).ToList();
                }
            }
            return dataError;
        }
  • This is the same idea that the poster was applying in the question, right? The question was asking for a more efficient way to do it. This answer just reposts working code that the OP already had years ago. – John Glenn Jan 28 '22 at 19:46
  • I agree. And why such a specialized method? A generic method is a versatile tool that serves an entire code base and can be used here just as well. – Gert Arnold Jan 28 '22 at 19:53
  • BTW, you may want to rename your method, it's a bit... ambiguous. – Gert Arnold Jan 28 '22 at 19:54
  • For sure, i just show how i used, not really the method or variables names that i used in my code. I think if use linq istead of foreach is a faster way to do it than the showed in the question. – Marcos Santin Jan 31 '22 at 12:22
-1

The solutions given here did not work for me, if the last set of records is less than the desired size of chunk datatable then it will simply ignore those records and resulting in losing them.. for ex if there are 5 records and chunk table size is 2 then it will create only 2 datatable ignoring the last record.

Here is the corrected code which worked for me in all scenarios.

Users working on VB.NET may or may not be able to use LINQ many a times so if you need vb.net code of the same then take a look here Split large datatable into chunks in c# and vb.net

 private static List<DataTable> SplitTable(DataTable mainTable, int batchSize)
{
    List<DataTable> tables = new List<DataTable>();
    int i = 0;
    int j = 1;
    int rowCount = 0;
    DataTable tempDt = mainTable.Clone();
    tempDt.TableName = "ChunkDataTable" + j.ToString();
    tempDt.Clear();
    foreach (DataRow row in mainTable.Rows) {
        rowCount += 1;
        DataRow newRow = tempDt.NewRow();
        newRow.ItemArray = row.ItemArray;
        tempDt.Rows.Add(newRow);
        i += 1;
        if (i == batchSize | rowCount == mainTable.Rows.Count) {
            tables.Add(tempDt);
            j += 1;
            tempDt = mainTable.Clone();
            tempDt.TableName = "ChunkDataTable" + j.ToString();
            tempDt.Clear();
            i = 0;
        }
    }
    return tables;
}
Jeff D
  • 349
  • 3
  • 4