0

I am bulkcopying records from a CSV file to a SQL table. The SQL table has columns that are varchar, and columns that are real datatype (based on the CSV attributes we are given).

Let's suppose that the first 7 columns are the Foreign Keys of varchar(100), and the rest of the 80+ columns are Real datatype.

During the bulk copy, I used Out-DataTable function because apparently that's the most efficient way to bulk copy (especially with our files containing 1000s of records).

However, I am getting the following error:

Exception calling "WriteToServer" with "1" argument(s): "The given value of type String from the data source cannot be converted to type real of the specified target column."

Now I wish the error could specify which column exactly, but based on my research, I've found that this could be related to the Datatype being presumed to be string type for all columns.

Verifying with the following: $column.DataType

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object
True     True     String                                   System.Object

So the question is: how do I tell the Datatable to allow the first 7 columns to be string, but the rest of them real datatype?

Here is the code:

function Get-Type 
{ 
    param($type) 
 
$types = @( 
'System.Boolean', 
'System.Byte[]', 
'System.Byte', 
'System.Char', 
'System.Datetime', 
'System.Decimal', 
'System.Double', 
'System.Guid', 
'System.Int16', 
'System.Int32', 
'System.Int64', 
'System.Single', 
'System.UInt16', 
'System.UInt32', 
'System.UInt64') 
 
    if ( $types -contains $type ) { 
        Write-Output "$type" 
    } 
    else { 
        Write-Output 'System.String' 
         
    } 
} #Get-Type

function Out-DataTable 
{ 
    [CmdletBinding()] 
    param([Parameter(Position=0, Mandatory=$true, ValueFromPipeline = $true)] [PSObject[]]$InputObject) 
 
    Begin 
    { 
        $dt = new-object Data.datatable   
        $First = $true  
    } 
    Process 
    { 
        foreach ($object in $InputObject) 
        { 
            $DR = $DT.NewRow()   
            foreach($property in $object.PsObject.get_properties()) 
            {   
                if ($first) 
                {   
                    $Col =  new-object Data.DataColumn   
                    $Col.ColumnName = $property.Name.ToString()   
                    if ($property.value) 
                    { 
                        if ($property.value -isnot [System.DBNull]) { 
                            $Col.DataType = [System.Type]::GetType("$(Get-Type $property.TypeNameOfValue)") 
                        } 
                    } 
                    $DT.Columns.Add($Col) 
                }   
                if ($property.Gettype().IsArray) { 
                    $DR.Item($property.Name) =$property.value | ConvertTo-XML -AS String -NoTypeInformation -Depth 1 
                }   
               else { 
                    $DR.Item($property.Name) = $property.value 
                } 
            }   
            $DT.Rows.Add($DR)   
            $First = $false 
        } 
    }  
      
    End 
    { 
        Write-Output @(,($dt)) 
    } 
 
} #Out-DataTable

$SqlConnection = New-Object System.Data.SqlClient.SqlConnection
$SqlConnection.ConnectionString = $connectionstring
$SqlConnection.Open()

$CSVDataTable = Import-Csv $csvFile | Out-DataTable

# Build the sqlbulkcopy connection, and set the timeout to infinite
$sqlBulkCopy = New-Object ("Data.SqlClient.SqlBulkCopy") -ArgumentList $SqlConnection
$sqlBulkCopy.DestinationTableName = "$schemaName.[$csvFileBaseName]"
$sqlBulkCopy.bulkcopyTimeout = 0
$sqlBulkCopy.batchsize = 50000
$sqlBulkCopy.DestinationTableName = "$schemaName.[$csvFileBaseName]"

#This mapping helps to make sure that the columns match exactly because BulkCopy depends on indexes not column names by default. 
#However, with the DataTable, the correct mappings seems to be already taken care of, but putting this here regardless, because why not?
#Better safe than sorry, right? ;)
#https://stackoverflow.com/a/50995201/8397835
foreach ($column in $CSVDataTable.Columns) { $sqlBulkCopy.ColumnMappings.Add($column.ColumnName, $column.ColumnName) > $null }

$sqlBulkCopy.WriteToServer($CSVDataTable)

# Clean Up
$sqlBulkCopy.Close(); $sqlBulkCopy.Dispose()
$CSVDataTable.Dispose()
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()

Maybe something like this?

PseudoCode:

foreach ($column in $CSVDataTable.Columns) { 
    $sqlBulkCopy.ColumnMappings.Add(
        if($DestinationTableName.Column.type -eq 'Real') {
            $column.type() = 'Real'
        }
        $column.ColumnName, $column.ColumnName
    ) > $null 
}
halfer
  • 19,824
  • 17
  • 99
  • 186
Cataster
  • 3,081
  • 5
  • 32
  • 79

2 Answers2

2

Out-DataTable is inspecting the properties of the first input object...

foreach($property in $object.PsObject.get_properties())
{
    if ($first) 
    {  

...to determine the DataType of the corresponding DataColumn...

if ($property.value -isnot [System.DBNull]) { 
    $Col.DataType = [System.Type]::GetType("$(Get-Type $property.TypeNameOfValue)") 
} 

The problem is, the input objects are produced by Import-Csv...

$CSVDataTable = Import-Csv $csvFile | Out-DataTable

...which doesn't do any conversion of the CSV fields; every property will be of type [String], therefore, every DataColumn will be, too.

The .NET equivalent of real is Single, so you either need to hard-code which columns (by name or ordinal) should be of type [Single]...

$objectProperties = @($object.PSObject.Properties)
for ($propertyIndex = 0; $propertyIndex -lt $objectProperties.Length)
{
    $property = $objectProperties[$propertyIndex]
    if ($propertyIndex -lt 7) {
        $columnDataType = [String]
        $itemValue = $property.Value
    }
    else {
        $columnDataType = [Single]
        $itemValue = if ($property.Value -match '^\s*-\s*$') {
            [Single] 0
        } else {
            [Single]::Parse($property.Value, 'Float, AllowThousands, AllowParentheses')
        }
    } 

    if ($first) 
    {   
        $Col =  new-object Data.DataColumn   
        $Col.ColumnName = $property.Name
        $Col.DataType = $columnDataType

        $DT.Columns.Add($Col) 
    }

    $DR.Item($property.Name) = $itemValue
}

...or augment your detection logic...

foreach($property in $object.PSObject.Properties)
{
    $singleValue = $null
    $isSingle = [Single]::TryParse($property.Value, [ref] $singleValue)

    if ($first) 
    {   
        $Col =  new-object Data.DataColumn   
        $Col.ColumnName = $property.Name
        $Col.DataType = if ($isSingle) {
            [Single]
        } else {
            [String]
        }

        $DT.Columns.Add($Col) 
    }

    $DR.Item($property.Name) = if ($isSingle) {
        $singleValue
    } else {
        $property.value
    }
}

To comply with the column DataType, this code substitutes the [Single] value for the original property [String] value when parsing succeeds. Note that I've removed the checks for [DBNull] and IsArray because they would never evaluate to $true since, again, Import-Csv will only produce [String] properties.

The above assumes that if a property's value from the first input object can be parsed as a [Single] then the same is true for every input object. If that's not guaranteed, then you can do one pass through all input objects to determine the appropriate column types and a second pass to load the data...

function Out-DataTable
{ 
    End 
    {
        $InputObject = @($input)
        $numberStyle = [System.Globalization.NumberStyles] 'Float, AllowThousands, AllowParentheses'
        $dt = new-object Data.datatable 

        foreach ($propertyName in $InputObject[0].PSObject.Properties.Name)
        {
            $columnDataType = [Single]

            foreach ($object in $InputObject)
            {
                $singleValue = $null
                $propertyValue = $object.$propertyName
                if ($propertyValue -notmatch '^\s*-?\s*$' `
                    -and -not [Single]::TryParse($propertyValue, $numberStyle, $null, [ref] $singleValue))
                {
                    # Default to [String] if not all values can be parsed as [Single]
                    $columnDataType = [String]
                    break
                }
            }

            $Col =  new-object Data.DataColumn   
            $Col.ColumnName = $propertyName
            $Col.DataType = $columnDataType

            $DT.Columns.Add($Col) 
        }

        foreach ($object in $InputObject)
        { 
            $DR = $DT.NewRow()   
            foreach($property in $object.PSObject.Properties) 
            {   
                $DR.Item($property.Name) = if ($DT.Columns[$property.Name].DataType -eq [Single]) {
                    if ($property.Value -match '^\s*-?\s*$') {
                        [Single] 0
                    } else {
                        [Single]::Parse($property.Value, $numberStyle)
                    }
                } else {
                    $property.value
                }
            }   
            $DT.Rows.Add($DR)   
        } 

        Write-Output @(,($dt)) 
    }  

} #Out-DataTable
Lance U. Matthews
  • 15,725
  • 6
  • 48
  • 68
  • ya after spending some time resreaching, it seems import-csv makes everything string, regardless of the datatype :/. I dont want to hard code unfortunately because the process is supposed to be dynamic, the csv file might change at anytime and users can't hardcode this. Btw, Where would i place this piece of code? Thank you for telling me about single, because i was thinking the equivalent is decimal according to the documentation: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_numeric_literals?view=powershell-7#real-literals – Cataster May 08 '20 at 05:29
  • The last snippet would replace the second snippet inside the `if ($first) { }` block. Come to think of it you'll also need to replace `$property.Value` with the parsed value because I don't think `DataTable` converts it for you even if it knows the data type. I think you're confusing the `real` SQL data type with what that document refers to as "real literals" as a general term for those numeric types. You could use any of `[Single]` (aka `[Float]`), `[Double]`, or `[Decimal]` here; I just looked up which one specifically maps to the SQL `real` type due to the use of the `SqlBulkCopy` class. – Lance U. Matthews May 08 '20 at 05:39
  • im getting this now: `Exception setting "Item": "Exception calling "set_Item" with "2" argument(s): "Input string was not in a correct format.Couldn't store in Report Year Column. Expected type is Single.""""$DR.Item($property.Name) = $property.value` – Cataster May 08 '20 at 05:43
  • Yeah, that's the lack of conversion I mentioned. I've updated the code to fix that. – Lance U. Matthews May 08 '20 at 05:55
  • I think [ref] is needed...because i got this: `Argument: '2' should be a System.Management.Automation.PSReference. Use [ref]. $isSingle = [Single]::TryParse($property.Value, $singleVa ...` – Cataster May 08 '20 at 05:58
  • Sorry, yes, that got lost in translation. The perils of programming in a web browser text field... – Lance U. Matthews May 08 '20 at 05:59
  • no worries, i come across that myself sometimes haha. btw, it seems even with the conversion, i am still getting the error: `Exception setting "Item": "Exception calling "set_Item" with "2" argument(s): "Input string was not in a correct format.Couldn't store in Report Year Column. Expected type is Single.""$property.value` – Cataster May 08 '20 at 06:01
  • Is `"BP19"` the value of an item in a `real`/`[Single]` column? That won't parse as a `[Single]`, of course. This all presumes, as the original code does, that the data type of an item in the first row is the same data type for the rest of that column. – Lance U. Matthews May 08 '20 at 06:05
  • The Report Year column is actually string/varchar, part of the first 7 columns basically (in this csv). So maybe this code snippet is converting everything to single? – Cataster May 08 '20 at 06:08
  • I just realize a regular integer, such as a year, will parse as a `[Single]`. Do all of your `real` values have a decimal part, or are there some whole numbers? – Lance U. Matthews May 08 '20 at 06:09
  • here is an example record: `OTHER_NON_VEHICLE_ITEM_ONONV,TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM,COS_AU,4F80,COP Argentina,BP19,Current,0,0,0,0,0,0, - ,0,0,0,-2215.729849` every comma represents a column – Cataster May 08 '20 at 06:11
  • In that line `BP19` is the 6th value. I'm guessing in the first (non-header) line the 6th value must be parsable to a floating-point number? I see in the question you ask "how do i tell the Datatable to allow the first 7 columns to be string, but the rest of them real datatype?" Is that still acceptable? Because here in the comments you say (or, at least, I understood you to mean) the type has to be determined dynamically from the run-time data. – Lance U. Matthews May 08 '20 at 06:18
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/213390/discussion-between-cataster-and-bacon). – Cataster May 08 '20 at 06:19
  • hey, i just discovered a major glitch with this answer. for some reason, my other csv files that just have varchar columns (which the csv file we were working on last time references their primary keys), are resulting in an error, this is one of them: `Exception calling "WriteToServer" with "1" argument(s): "Violation of PRIMARY KEY constraint 'PK_COS'. Cannot insert duplicate key in object 'dbo.COS'. The duplicate key value is (0).` for some reason, this function is somewhere just converting the data to 0? so it seems all rows just contain 0's in the datatable? – Cataster May 11 '20 at 09:44
  • i tried the original Out-DataTable again after hours of testing, and its true, that one works compleetly as expected, but the one you have to address the Real datatype problem somehow breaks for those files. – Cataster May 11 '20 at 09:45
  • I'm not understanding. Every value of every row in the `DataTable` is `0`, or just the values of the columns with the foreign key relationship? You're running this code against a different CSV structure? The code, of course, is substituting `0` for `-` in every column deemed to consist of entirely `[Single]`-convertible values. If the value is converted on the foreign key side of the relationship but not the primary key side then that would likely break things, though I'm not sure why that would manifest as a "Cannot insert duplicate key" error instead of some kind of type mismatch error. – Lance U. Matthews May 11 '20 at 19:21
  • could we meet on chat again? i tried the original file we used on may 8th, and its resulting in this error now for some reason :( – Cataster May 11 '20 at 19:25
  • There was another method, using Write-SqlTableData, however, that just recognizes strings...To sanitize the Real Datatype measures/accounts, Thomas's (https://stackoverflow.com/a/61729656/8397835) automated approach worked perfectly however, i notice its much slower than the bulkcopy approach we worked on last week! shame it stopped working all of a sudden :( not even the first 2 rows are getting processed through this updated function, all i see is 0's inserted in the table for all fields. – Cataster May 11 '20 at 22:58
  • Are you using the code you ended up with at the end of our chat, or did you copy what I edited into the answer? If the latter, it's possible I missed incorporating something along the way. I see that answer is using `Add-Member` to construct the output object property-by-property, which is a very slow way to do that compared to `New-Object -TypeName 'PSObject' -Property @{ Foo = 'bar' }` or `[PSCustomObject] @{ Foo = 'bar' }`. It might be a good idea (if not too late) to link from that question to this one so people can see what ground has already been covered/what code you're working with. – Lance U. Matthews May 11 '20 at 23:23
  • ya i copied what you had here as my final code version as well. – Cataster May 11 '20 at 23:30
  • I just diffed the final code you posted in chat with my answer and the only difference is you had left in the `try { ... } catch { Write-Warning ... }` (plus a tiny fix I just made when matching `-` fields). I can't explain why the code wouldn't work any more _unless_ you're not running it against the same file you were testing with last week and this input CSV has other quirks this code isn't accounting for. – Lance U. Matthews May 11 '20 at 23:37
  • i assure you, i am trying this on just the first 2 rows, which we were testing on in teh chat and they did not change, and nope...it just inserts 0...thats why im very confused, what could possibly be happening....i even have other individual csv files that have the primary keys, and when i run this function on them, i get an error: Violatiion of PRIMARY KEY constraint, duplicate key value (0), even though i have no 0's whatsoever in these other csv files, since they are all pretty much string/words. after i observed the 0's in teh Fact table which as the Reals, i realized its inserting 0's – Cataster May 11 '20 at 23:40
  • and since its trying to insert 0's for all fields, it makes sense why i am getting the violation of primary key constrain error on those other files/tables. the question is: why is it attempting to insert 0's... – Cataster May 11 '20 at 23:42
  • Try it with the fix I just put in. When I was searching for `-` fields I wasn't anchoring it to the entire field so it would replace values that contained a `-` anywhere with `0`, though only when the entire column was deemed to be convertible to `[Single]`. – Lance U. Matthews May 11 '20 at 23:46
0

I had a similar challenge after importing XML-data via .readXml because the XML included empty strings instead of dbnull. I made a lot of tests to get this converted as fast as possible and for me this worked best:

  1. create a dataTable with all columns as stings for the data-import
  2. create the same columns with slighly different names with the correct target type in the same table and create a reference to the string-column (e.g. if first column of type 'string' is named "c1" then i named the new column of type 'real' "c1_")
  3. during creation of each column in step 2 also create an expression like 'IIF(LEN([c1]=0),NULL,[c1])' that solves the "empty-string"-dilemma.
  4. now do a bulk-import and finally export only the referencing columns via a dataTableReader into its own dataTable.
Carsten
  • 1,612
  • 14
  • 21