Convert a list of key/value pairs to datatable

Question

I'm working on a parser. It gets values from source text. It does not know beforehand how many or which values it will get, i.e. names of variables, their count etc. could vary greatly. Each section of source provides some values only, not a complete list. Those values are currently stored in a list of custom class, similar to KeyValuePair, but written from scratch.

Sample what is retrieved from source:

Section 1:
    KeyA = ValA1
    KeyB = ValB1
    KeyD = ValD1
Section 2:
    KeyC = ValC2
Section 3:
    KeyB = ValB3
    KeyD = ValD3

etc.

Now, I'd like to show this information to user as a DataGrid in form of:

| KeyA  | KeyB  | KeyC  | KeyD  |
+-------+-------+-------+-------+
| ValA1 | ValB1 |       | ValD1 |
|       |       | ValC2 |       |
|       | ValB3 |       | ValD3 |

Currently, I'm iterating through all values found in each section, check if column exists - if not - creating new column. If column exists - adding value to respective row/column. Then attaching resulting DataTable to DataGrid as:

dg.ItemSource=dt.AsDataView();

This works perfectly as intended, yet, that is too slow.

I'd appreciate any thoughts on how I could speed that up. Either initial storing, or convertion to DataTable, or some other way of binding data to achieve the same presentation to user.

C#, WPF, .NET framework 4.5

Update: All loading and processing is done beforehand. Ready data is stored as a tree of processed sections. Each section as one of properties holds a list of key/value pairs. Each section has class to populate given DataTable with it's values.

I.e. data on backend looks like:

File1
  + Section 1 on level 1
  |   + Section 1
  |   + Section 2
  + Section 2 on level 1
  + Section 3 on level 1
  |   + Section 1
  |   + Section 2
  |   + Section 3
  |   + Section 4
  + Section 4
File2 ...

Each Section has a method:

public void CollectValues(DataTable target) {...}

Which is called by higher level element with some DataTable (initially - empty and getting filled as it goes).

Each section contains internal variable:

private List<CustomValue> Values;

Which holds all the already found&processed values in CustomValue class. CustomValue ~= KeyValuePair, but with added processing routines.

So what happens is CollectValues is being called from requested level (could be top, could be any other) with empty unprepared DataTable. CollectValues iterates (foreach) through all available values in list on current level and adds them to target DataTable 1 at a time, prior to that checking if DataColumn exists with needed name (target[Value.Key]!=null) - and creating column before attempting to add respective value if needed. In metacode:

public void CollectValues(DataTable target)
{
    DataRow dr = target.Rows.Create();
    foreach(var pair in Values)
    {
        if(target[pair.Key]==null) target.Columns.Add(...);
        dr[pair.Key] = pair.Value;
    }
    foreach(var child in Children)
        child.CollectValues(target);
}

Why this specific part - values is just part of similar routines. Other routines crawl similarly on same data set, retrieving other things (mostly working with lists, no DataTables) - all of them work near instantly. Collecting DataTable though might take a few seconds for 1 source for resulting DataGrid to get populated.

Average amount of Values rarely exceeds 1000 (like, 10 columns by 100 rows). DataTable is attached to DataGrid only after it was fully populated.

Just for info on sizes: Sources - usually 2 to 10 files. Each source text size can range 100Kb - 100 MB. Usual file size is around 1-2 MB. Size of backend data in memory usually is under 100 MB.

And to highlight again. It's only DataTable that worries me. Highlights, Sectioning, source retrieval, filtering etc. - all works within my expectations. So I'm looking first of all - for a way to optimize conversion from list of key/value pairs to DataTable, or for a way to store those values differently initially (after processing) to speed up process.

Hope this gives enough info. Not listing source currently to reduce size.

looks like a pivot table. post the code that creates the pivot. how are you retrieving the data? good chance the latency is fetching the data, or the volume of data and not the actual action of pivoting the data. — Jason Meckley, Dec 21 '12 at 14:13
Data is fetched separately and stored in backend already processed. This happens during selected files loading. Only once all of loading/processing is done - user gets to work with the interface. Slow part looks to be the collection/conversion part. It goes in iterations over quite large tree, requests DataTable to be populated by each section. — chersun, Dec 21 '12 at 15:15
Can you post a sample of your iterating code? That's the part that needs to be replaced, so we'd need to see what your data structures look like. — Bobson, Dec 21 '12 at 15:21
`Slow part looks to be the collection/conversion part.` have you quantified this? how much data is loaded into memory? again, typically the problem is the amount of data and how it's fetched. pivoting wouldn't be the bottleneck. to prove this. bind the raw results to a gridview (bypassing the pivoting logic and binding to the 1st grid). — Jason Meckley, Dec 21 '12 at 15:49
@JasonMeckley I didn't do profiling on this part - using only my time sense. 1000 values processing should be instant in my opinion. Filtering & processing thousands lines (tested up to 5 mil lines / 300 MB source) - should be rather slow. Yet I come to UI for datagrid - and it takes annoying 1-2 second to populate 100 rows, while preprocessing of full source (~1MB in this case) with all sections/values/splitting/cleaning takes less time then DataGrid population. So something in this specific thing works far from flawless :(. — chersun, Dec 21 '12 at 16:00

Bobson · Answer 1 · 2012-12-24T16:11:48.267

1

I'd look for a data structure other than a DataTable to use here. It sounds to me like what you need is a Dictionary<string, Dictionary<int, CustomValue>>. The string is your column name, the int is an ID for the row of data, and CustomValue is the data itself.

public void CollectValues(Dictionary<string, Dictionary<int, CustomValue>> target)
{
    foreach(var pair in Values)
    {
        if(target[pair.Key]==null) target.Add(new Dictionary<int, CustomValue>());
        target[pair.Key].Add(pair.ID, pair.Value);
    }
    foreach(var child in Children)
        child.CollectValues(target);
}

If you don't already have an pair.ID in place, you can just use a counter variable (either static or passed with each call) so that each object has a different ID.

It might make more sense to store the values by row, with the columns that each set of data has, rather than the reverse. That would be a IEnumerable<Dictionary<string, CustomValue>>, with each Dictionary representing one row. You would pull out all the columns with target.Select(x => x.Key).Distinct().

edited Dec 24 '12 at 16:11

answered Dec 21 '12 at 19:26

Bobson

13,498
5
55
80

I didn't work much with such structure. Looks good. Could you show also how to bind such Dictionary of dictionary to some kind of grid (gridview/datagrid)? Can it be directly bound to ItemsSource? – chersun Dec 22 '12 at 13:01
I often use a Dictionary for ComboBox bindings. I bind the SelectedValuePath to `Key` and DisplayMemberPath to `Value`. This way I can have keys ranging from 1 to 5 for strings such as "apple", "banana", etc. The user sees the strings, but when asking the ComboBox for the selectedvalue, I get the Key associated with it (typically, a database primary ID). – Joe Dec 22 '12 at 15:30
@chersun - Hmm. Good question. I don't think it can be bound directly, but you can probably pull the values you need out of it. `target.Keys` is the column list, `target.Values.Select(x => x.Values)` is a list of the values for each row. I played with it a bit, though, and I'm not sure this is the best data structure for reading the results out, though. I'll edit with another suggestion. – Bobson Dec 24 '12 at 16:07

score 0 · Answer 2 · edited May 23 '17 at 12:09

0

DataTable is slow. It does a lot of stuff.

If you are all string then I would create a collection

List<String> ColNames;
List<String> ColValues;

List<ColValues> RowsColValues;

Then you need to manually bind the columns to the DataGrid using ColValues[i] syntax.

And for speed use ListView GridView for this.
DataGrid is slow and bulkly compared to Gridview.
But GridView does not edit.

Not making this up.
I do exactly this but a different scenario.
User selects the columns they want to see.

DyamicColumns

edited May 23 '17 at 12:09

Community

1
1

answered Dec 21 '12 at 17:10

paparazzo

44,497
23
105
176

Will try GridView for a change. Editing is indeed not required in my case. Could you elaborate the binding you mean a bit? I need value names to be headers and values to be in respective row/header. I don't quite get it on how to do that in this case. – chersun Dec 22 '12 at 12:57
Use the ColNames for the headers. – paparazzo Dec 22 '12 at 14:54

Convert a list of key/value pairs to datatable

2 Answers2