I have imported a very large CSV-table via Powershell into an array of objects. Each object may look like this definition:
$test = [PsCustomObject]@{id='1';name='a'}
The problem is, that all types are 'string' and I need to work with the correct types later on in my code. For this I want to use an embedded C# code that converts the values of all object property into a string-array to add its values into a DataTable with correct type definitions.
I am now struggling on the C# part. Here is my code-sample which is not working. How can I change the C# part to do the conversion of the object-values into an array?
# this code should convert the object-values into an array:
Add-Type -ReferencedAssemblies System.Data -TypeDefinition '
using System;
public class row {
public object[] row(object test) {
int id = Int32.Parse(test.id);
string name = test.name;
return (id, name);
}
}'
# this is the test-object:
$test = [PsCustomObject]@{id='1';name='a'}
# this is the final table with correct types:
$table = [System.Data.DataTable]::new()
[void]$table.Columns.Add('id', [int])
[void]$table.Columns.Add('name', [string])
# at the end it should work like this:
$row = [row]::new($test)
$table.Rows.Add($row)
I did some tests without C#, but this is very slow. E.g. this simple loop (even without adding the data into the row) runs over 20 seconds:
$test = [PsCustomObject]@{id='1';name='a'}
foreach($x in 1..1MB) {
$row = foreach($i in $test.PsObject.Properties.Value){if ($i) {$i} else {$null}}
#[void]$table.rows.Add($row)
}
So in theory I need to do the same like in the last code-block but via embedded Csharp code.
How can I get this done in an efficient way?
Update #1: Thanks to the input from Theo I could speed-up the conversion. I did not expect that this would be 5-times faster than just querying PsObject-properties. E.g. it turns out, that an 'else'-statement is slower than just assigning the var first. Here is the code for comparison:
$test = [PsCustomObject]@{id='1';name='a'}
foreach($x in 1..1MB) {
$id = $test.id
if ([string]::IsNullOrEmpty($id)){$id = $null}
$name = $test.name
$row = @($id, $name)
}
But it is still the slowest part in my overall code I am still looking for a smart C# solution. My idea is, that if there are any other properties for the input-object later, then I can just dynamically rebuild the C# code. That will not work for pure PS-code.
Update #2: Based on the input from BACON I was able to solve the challenge with the C# code. Here is my working implementation:
Add-Type -TypeDefinition '
using System;
public class test {
public string id {get;set;}
public string name {get;set;}
}
public static class DataParser {
public static string[] ParseToArray(test data) {
string id = data.id;
if (String.IsNullOrEmpty(id)) {id = null;};
string name = data.name;
return new string[] {id,name};
}
}'
# this is the test-object:
$test = [PsCustomObject]@{id='';name='a'}
$timer = [System.Diagnostics.Stopwatch]::StartNew()
foreach($x in 1..1MB) {
$row = [DataParser]::ParseToArray($test)
}
$timer.Stop()
$timer.Elapsed.TotalSeconds
What I did not expect is the runtime of this solution - it is way slower than the pure PowerShell version which I posted above. So my conclusion is "mission accomplished", but we missed the target. Which means that there is no real efficient way to convert object values into an array.
As a result of this, I will step away from importing CSV-data as objects and will focus on importing large data as XML via 'dataSet.readXML'. I only wish there would be a build-in option to directly import CSV-data as arrays or dataRows.