I'm working on a parser. It gets values from source text. It does not know beforehand how many or which values it will get, i.e. names of variables, their count etc. could vary greatly. Each section of source provides some values only, not a complete list. Those values are currently stored in a list of custom class, similar to KeyValuePair, but written from scratch.
Sample what is retrieved from source:
Section 1:
KeyA = ValA1
KeyB = ValB1
KeyD = ValD1
Section 2:
KeyC = ValC2
Section 3:
KeyB = ValB3
KeyD = ValD3
etc.
Now, I'd like to show this information to user as a DataGrid in form of:
| KeyA | KeyB | KeyC | KeyD |
+-------+-------+-------+-------+
| ValA1 | ValB1 | | ValD1 |
| | | ValC2 | |
| | ValB3 | | ValD3 |
Currently, I'm iterating through all values found in each section, check if column exists - if not - creating new column. If column exists - adding value to respective row/column. Then attaching resulting DataTable to DataGrid as:
dg.ItemSource=dt.AsDataView();
This works perfectly as intended, yet, that is too slow.
I'd appreciate any thoughts on how I could speed that up. Either initial storing, or convertion to DataTable, or some other way of binding data to achieve the same presentation to user.
C#, WPF, .NET framework 4.5
Update: All loading and processing is done beforehand. Ready data is stored as a tree of processed sections. Each section as one of properties holds a list of key/value pairs. Each section has class to populate given DataTable with it's values.
I.e. data on backend looks like:
File1
+ Section 1 on level 1
| + Section 1
| + Section 2
+ Section 2 on level 1
+ Section 3 on level 1
| + Section 1
| + Section 2
| + Section 3
| + Section 4
+ Section 4
File2 ...
Each Section has a method:
public void CollectValues(DataTable target) {...}
Which is called by higher level element with some DataTable (initially - empty and getting filled as it goes).
Each section contains internal variable:
private List<CustomValue> Values;
Which holds all the already found&processed values in CustomValue class. CustomValue ~= KeyValuePair, but with added processing routines.
So what happens is CollectValues is being called from requested level (could be top, could be any other) with empty unprepared DataTable. CollectValues iterates (foreach) through all available values in list on current level and adds them to target DataTable 1 at a time, prior to that checking if DataColumn exists with needed name (target[Value.Key]!=null) - and creating column before attempting to add respective value if needed. In metacode:
public void CollectValues(DataTable target)
{
DataRow dr = target.Rows.Create();
foreach(var pair in Values)
{
if(target[pair.Key]==null) target.Columns.Add(...);
dr[pair.Key] = pair.Value;
}
foreach(var child in Children)
child.CollectValues(target);
}
Why this specific part - values is just part of similar routines. Other routines crawl similarly on same data set, retrieving other things (mostly working with lists, no DataTables) - all of them work near instantly. Collecting DataTable though might take a few seconds for 1 source for resulting DataGrid to get populated.
Average amount of Values rarely exceeds 1000 (like, 10 columns by 100 rows). DataTable is attached to DataGrid only after it was fully populated.
Just for info on sizes: Sources - usually 2 to 10 files. Each source text size can range 100Kb - 100 MB. Usual file size is around 1-2 MB. Size of backend data in memory usually is under 100 MB.
And to highlight again. It's only DataTable that worries me. Highlights, Sectioning, source retrieval, filtering etc. - all works within my expectations. So I'm looking first of all - for a way to optimize conversion from list of key/value pairs to DataTable, or for a way to store those values differently initially (after processing) to speed up process.
Hope this gives enough info. Not listing source currently to reduce size.