1

I have a large DataTable - around 15000 rows and 100 columns - and I need to set the values for some of the columns in every row.

// Creating the DataTable
DataTable dt = new DataTable();
for (int i = 0; i < COLS_NUM; i++)
{
    dt.Columns.Add("COL" + i);
}    
for (int i = 0; i < ROWS_NUM; i++)
{
    dt.Rows.Add(dt.NewRow());
}

// Setting several values in every row
Stopwatch sw2 = new Stopwatch();
sw2.Start();
foreach (DataRow row in dt.Rows)
{
    for (int j = 0; j < 15; j++)
    {
        row["Col" + j] = 5;
    }
}
sw2.Stop();

The measured time above is about 4.5 seconds. Is there any simple way to improve this?

Amit
  • 1,174
  • 2
  • 15
  • 22
  • When looping through `ROWS_NUM` and creating new rows, is it possible to initialize the column values there? That would save using the third loop. – Dave Zych Dec 15 '12 at 20:06
  • The initialization is just for this example, in the real case scenario the DataTable is coming from the DB. – Amit Dec 16 '12 at 12:22

3 Answers3

2

Before you populate the data, call the BeginLoadData() method on the DataTable. When you have finished loading the data, call EndLoadData(). This turns off all notifications, index maintenance, and constraints, which will improve performance.

As an alternative, call BeginEdit() before updating each row, and EndEdit() when the editing for that row is complete.

Here is a link with more information on improving DataSet performance: http://www.softwire.com/blog/2011/08/04/dataset-performance-in-net-web-applications/

user3308241
  • 344
  • 3
  • 10
1

One improvement that I can think of is editing columns by their indices, rather than their names.

foreach (DataRow row in dt.Rows)
{
    for (int j = 0; j < 15; j++)
    {
        row[j] = 5;
    }
}

With an empirical test, your method seems to run in ~1500 milliseconds on my computer, and this index based version runs in ~1100 milliseconds.

Also, see Marc's answer in this post:

Set value for all rows in a datatable without for loop

Community
  • 1
  • 1
hattenn
  • 4,371
  • 9
  • 41
  • 80
  • This slightly improved the performance when I tested it (about 10%), although I'm still hoping for more. I'll accept if nothing more significant comes up. – Amit Dec 16 '12 at 12:21
  • I know it doesn't sound well but I think the best way would be to not load 15000 rows in a `DataTable`. Are you sure you need them all loaded at once? Maybe it's better to check the design of your system. – hattenn Dec 16 '12 at 13:47
  • The design in the specific scenario is not ideal, no doubt about it, but for now it was easier to make improvements instead of redesigning it. What I ended up doing - 1) Using column indexes instead of names. 2) When creating dynamic columns, giving them a default value (which applies to all the existing rows in practically zero time) and then assigning row values only when they're different than the default. – Amit Dec 17 '12 at 09:09
0

this depends on your business logic which is not clear in your question, however, If you want to set the values for some of the columns in every row, try the following,

  • Create a separated temp column(s), you might create it in the same loop when creating the original data table

  • Fill the new values into this column,

  • delete the old column and insert the new one in its place instead.

This solution will be logical if you can expect the new values or if you have the same value for all rows (like in your example) or if you have some kind of repeat, in that case adding a new column with loaded will be much more faster than looping all rows.

Alaa Alweish
  • 8,904
  • 16
  • 57
  • 84