0

I have list of object that needs to convert into datatable.

The collection length is existed for 20K or more than that.

When i try to iterate the collection using Parallel.for then, it just hang up and took too longer time.

Can any one suggest the best way to convert List of object to datatable optimally ?

test
  • 91
  • 11

2 Answers2

0

If you already have the object in memory and must to convert them to a DataTable you are pretty screwed. DataTable isn't thread safe

https://social.msdn.microsoft.com/Forums/en-US/ddcdac9d-35e7-4b9f-a367-242bf60c42f2/faq-item-is-datatable-thread-safe

And you are doubling up your memory usage.

My only suggestion would be that perhaps you can wrap your existing collection in an object inheriting from DataTable and override or hide the Methods so that they reference your underlying list.

However I think this is unlikely to be a 'good' or easy solution to your problem. The best approach would be remove the need for the DataTable

Ewan
  • 1,261
  • 1
  • 14
  • 25
  • We have existing collection of objects, i.e length > 20K and i need to convert this list into datatable to process further. What is the best way to do this? – test May 01 '15 at 08:57
  • ahh sorry i misunderstood. – Ewan May 01 '15 at 09:01
  • while DataTable operations are not thread safe, I think one still can parallelize the work in question and benefit from it if the conversion from custom object to a DataRow is expensive. Please see my answer for details – anikiforov May 01 '15 at 13:07
  • nice, but really its equivalent to my first answer. "If you are doing heavy work per row, separate this out" I realize this isn't a great answer unless I add a good method of wrapping the object – Ewan May 01 '15 at 13:46
0

While DataTable operations (including .NewRow() ) are not thread safe, your work still can be parallelized using thread-local variables in the parallel loop:

List<string> source = Enumerable.Range(0, 20000).Select(i => i.ToString()).ToList();
DataTable endResult = CreateEmptyTable();
object lck = new object();

Parallel.For(
    0, source.Count,
    () => CreateEmptyTable(), // method to initialize the thread-local table
    (i, state, threadLocalTable) => // method invoked by the loop on each iteration
    {
        DataRow dr = threadLocalTable.NewRow();

        // running in parallel can only be beneficial 
        // if you do some CPU-heavy conversion in here
        // rather than simple assignment as below
        dr[0] = source[i];

        threadLocalTable.Rows.Add(dr);
        return threadLocalTable;
    },

    // Method to be executed when each partition has completed. 
    localTable =>
    {
        // lock to ensure that the result table 
        // is not screwed by merging from multiple threads simultaneously
        lock (lck)
        {
            endResult.Merge(localTable);
        }
    }
);

where

    private static DataTable CreateEmptyTable()
    {
        DataTable dt = new DataTable();
        dt.Columns.Add("MyString");
        return dt;
    }

However the parallel execution will only be beneficial if the time saved on the conversion 'your object instance' -> DataRow is greater than the time lost on joining the result in the end of the execution (locks + DataTable merges). Which is only possible if your conversion is somewhat CPU-heavy. In my example the conversion (dr[0] = source[i]) is not CPU heavy at all, and hence sequential execution is preferrable.

PS. the above example modified to run sequentially completes under 20ms on my IntelCore-i7-3537U. If your sequential executional times are low, you may not want to bother with parallel execution at all.

anikiforov
  • 495
  • 6
  • 19
  • You can also limit the parallel tasks so it doesn't eat up your CPUs. http://stackoverflow.com/questions/9290498/how-can-i-limit-parallel-foreach – Jacob Roberts May 01 '15 at 13:02