0

I have a csv file that has the first 7 columns as varchar (string), while the rest of them are 'Real' data type. I believe the .NET equivalent of Real is Single.

Anyways, i tried the following based on a suggestion:

,(Import-Csv -Path ".\Fact.csv") | Write-SqlTableData -ServerInstance "server.db.com" -DatabaseName
 "TestDB" -SchemaName "dbo" -TableName "Fact" -Force

and i received this (expected) error:

Write-SqlTableData : The given value of type String from the data source cannot be converted to type real of the specified target column.

The question is: How to make this utility accept Real datatypes?

csv file sample:

BGSC,MYTrim Code,CC,OC,Ctry Of Prod,Report Year,Scenario,Volume,Gross Revenue ,SA ,ANSR, TM  ,TL,"P and Warranty, others",CM ,Total Variable,Variable Profit , EBIT  
TOTAL_CKD,TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM,COS_AU,4F80, ,2015,Current, -   , -   , -   , -   , -   , -   , -   , -   , -   , -   ," (136,748)"
CXXXXXXXXXX,TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM,COS_AU,4F80, ,2013,Current, -   , -   , -   , -   , -   , -   , -   , -   , -   , -   ," 1,647 "
CXXXXXXXXXX,TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM,COS_AU,4F80, ,2014,Current, -   ," 16,098 ", (0)," 16,098 ", 1 , -   , 8 ," 16,090 ", -   ," 16,090 "," 61,219 "
CXXXXXXXXXX,TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM,COS_AU,4F80, ,2015,Current, -   ," (12,496)", -   ," (12,496)"," (13,386)", -   , -   , 890 , -   , 890 , (192)
Cataster
  • 3,081
  • 5
  • 32
  • 79
  • @Thomas plz see my post for example i posted. dashes mean no value or 0, paranthesis also mean subtraction/negative amount – Cataster May 11 '20 at 11:09
  • Which properties do you want to be "numbers"? And what localization do you use (e.g. is 1,234 greater than 1000 or less than 2)? – stackprotector May 11 '20 at 11:15
  • @Thomas basically something like this: `$numberStyle = [System.Globalization.NumberStyles] 'Float, AllowThousands, AllowParentheses'; if ($DT.Columns[$property.Name].DataType -eq [Single]) { if ($property.Value -match '\s*-?\s*') { [Single] 0 } else { [Single]::Parse($property.Value, $numberStyle) } } else { $property.value }` – Cataster May 11 '20 at 11:17
  • 3
    As an aside, your error here suggests a major design issue. You should *absolutely not* be using an approximate data type in your SQL data table if these are financial values and it looks like they are. You should be using `decimal` or `numeric`. `money` and `smallmoney` are okay, too, if you understand those types. `int` and `bigint` are fine, too. The data types of `real`, `float` should be *strictly avoided* for financial data because they do not adequately guarantee that the values stored are accurate nor will aggregates on those fields be accurate. Your math *will* be wrong. – Bacon Bits May 11 '20 at 14:28

3 Answers3

0

you need to cast the type using Select-object

see this article for details

https://www.michev.info/Blog/Post/772/typecasting-for-import-csv

  • 1
    Hi Krilian, welcome to the community! In answer, a code is expected. could you demonstrate a working example of the type casting with select-object? – Cataster May 11 '20 at 11:13
  • 1
    Always quote the most relevant part of an important link in case the target site is unreachable or goes permanently offline. – somebadhat May 11 '20 at 14:00
0

The following code will convert your "numbers" to integers. I defined a function to sanitize your number strings. Then I used Select-Object to redefine some of your properties:

function Sanitize-String {
    param(
        $String
    )

    if ($String -match '^-\s*$') {
        return "0"
    } else {
        return (($String -replace '[\"\s\,]', '') -replace '\((\d+)\)', '-$1')
    }
}

Import-Csv -Path ".\Fact.csv" | Select-Object -Property BGSC, 
    'MYTrim Code', 
    CC, 
    OC, 
    'Ctry Of Prod', 
    'Report Year', 
    Scenario, 
    @{Name='Volume';Expression={[int](Sanitize-String -String $_.Volume)}}, 
    @{Name='Gross Revenue';Expression={[int](Sanitize-String -String $_.'Gross Revenue ')}},
    @{Name='SA';Expression={[int](Sanitize-String -String $_.'SA ')}},
    @{Name='ANSR';Expression={[int](Sanitize-String -String $_.ANSR)}},
    @{Name='TM';Expression={[int](Sanitize-String -String $_.'TM ')}},
    @{Name='TL';Expression={[int](Sanitize-String -String $_.'TL')}},
    @{Name='P and Warranty, others';Expression={[int](Sanitize-String -String $_.'P and Warranty, others')}},
    @{Name='CM';Expression={[int](Sanitize-String -String $_.'CM ')}},
    @{Name='Total Variable';Expression={[int](Sanitize-String -String $_.'Total Variable')}},
    @{Name='Variable Profit';Expression={[int](Sanitize-String -String $_.'Variable Profit ')}},
    @{Name='EBIT';Expression={[int](Sanitize-String -String $_.'EBIT ')}}

It works, but to be honest, your csv file is a mess. It is possible to have properties including space characters. But do you really need (multiple) leading or trailing space characters in a property name? Also, try to insert numbers as raw as possible. Don't insert quotes, brackets, spaces around your numbers. Put in 0 directly instead of -.


A more automated way:

function Sanitize-String {
    param(
        $String
    )

    if ($String -match '^-\s*$') {
        return "0"
    } else {
        return (($String -replace '[\"\s\,]', '') -replace '\((\d+)\)', '-$1')
    }
}

$csvContent = Import-Csv -Path ".\Fact.csv" 

$results = @()
foreach ($line in $csvContent) {
    $i = 0
    $result = New-Object -TypeName psobject
    foreach ($prop in $line.PSObject.Properties) {
        if ($i -lt 7) {
            $result | Add-Member -MemberType $prop.MemberType -Name $prop.Name -Value $prop.Value
        } else {
            $result | Add-Member -MemberType $prop.MemberType -Name $prop.Name -Value ([int](Sanitize-String -String $prop.Value))
        }
        $i++
    }
    $results += $result
}
$results # pipe to your DB for example
stackprotector
  • 10,498
  • 4
  • 35
  • 64
  • thank you Thomas. actually, the csv file is extracted from a xlsx workbook provided by another source, and the accounts are automatically converted it seems to these characters for csv compatibility. one thing i was hoping, is it possible you could demonstrate an updated version where the columns are automatically parsed? instead of having to hardcode each column through an object expression, it would be nice to have it dynamically know that the first 7 columns are string, while the rest are "Real" – Cataster May 11 '20 at 12:31
  • Your `Sanitize-String` here is probably wrong. `(1234)` often means -1234. If so, you're changing negative values to positive. – Bacon Bits May 11 '20 at 14:14
  • @Thomas ya me neither, until i started my accounting class this weekend lol. i was just given this file and expected to know these representations. – Cataster May 11 '20 at 18:41
  • @Thomas ya so the paranthesis mean negative number – Cataster May 11 '20 at 18:45
  • @Cataster Added support for numbers inside brackets being negative. – stackprotector May 11 '20 at 18:54
  • @Thomas thanks, btw, the report year, even though it a number, its actually just varchar (part of the first 7 columns i mention in my post). but i think i would just exclude that expression to sanitize it right? – Cataster May 11 '20 at 19:06
  • @Cataster Added a more automated way. – stackprotector May 11 '20 at 19:29
  • @Thomas Cannot convert value "3493889005" to type "System.Int32". Error: "Value was either too large or too small for an Int32." – Cataster May 11 '20 at 19:42
  • Thomas, thanks to your automated way of code, i just had to add Int64 but besides that, this is working perfectly! Thank you so much! if i may ask, instead of specifying "7" as the loop counter, could we make this even more dynamic to where it will peek through the columns, identify if it fits a 'Real' or String datatype, and act on (sanitize) it accordingly? – Cataster May 11 '20 at 22:28
  • @BACON and I worked on a very good dynamic function for Out_Datatable (https://stackoverflow.com/a/61672548/8397835) last week thats even faster, but idk what happened that it stopped working correctly anymore! all i see is 0's everywhere in the table...hes suggested he might have something missing in the final answer he posted compared to the code we were testing step by step in the chat discussion we had. i had no choice but to go with the Write-SQLTable built in function, wich is good but we'll have to apply the conversion on it once again like BACON did with Out-Datatable – Cataster May 11 '20 at 23:37
  • BACON's function is now working! When he was searching for - fields he wasn't anchoring it to the entire field so it would replace values that contained a - anywhere with 0, though only when the entire column was deemed to be convertible to [Single] – Cataster May 12 '20 at 00:07
  • @Cataster You can do it even more dynamically. But help me, how do you decide if "2015" should be a string or a number? ;-) So no, as a string may contain only digits, you cannot decide if that is a string that contains only digits or it is a string that should be converted to a number, because it contains only digits. You have to provide more definitions, like "there will never be a string that contains only digits". – stackprotector May 12 '20 at 05:49
  • @Thomas that's actually an excellent question :) not all values in the column, Report Year, where 2015 is one of the values, are just ”numbers”. Theres a value in report year for example called ”BP19”. The dynamic approach then would be to scan pretty much all the property values in a column to determine correct assignment of column data type. However, I could see a limitation with that assumption if we get a brand new set of data that may simply contain just ”number” years in Report Year, which is supposed to be varchar column. In that case, we can have an additional/another check ... – Cataster May 12 '20 at 07:12
  • @Thomas ...[continued] the other check could be if the values in the column are numbers but the next column contains strings, then this column with numbers is deemed string as well because it's part of the primary keys. But i could see yet another limitation with this if the column is positioned right before a Real data type column. Hmm...too many assumptions to account for lol – Cataster May 12 '20 at 07:12
0

I agree with Thomas that your csv file is in really bad shape. The headers are surrounded with whitespace, aswell as the data. Some of the numeric values are surounded by brackets, while others are not, and may or may not include a thousand separator character , (in your locale that is).

I would propose something to clean that all up like this:

$csv = Import-Csv -Path ".\Fact.csv"
# clean up the CSV
# get the header names from the first row, trim whitespaces and surround with quotes
$headers = ($csv[0].PSObject.Properties.Name.Trim()  | ForEach-Object { '"{0}"' -f $_ }) -join ','

# loop through all data rows in the csv. (each row is a PSObject with properties)
$data = foreach($item in $csv) {
    # loop through all properties (fields) in the item
    $row = foreach($field in $item.PSObject.Properties.Value) {
        '"{0}"'-f ($field -replace '[\s(),]' -replace '-', '0')
    }

    $row -join ','
}
# recombine the headers with the data
$newCsv = $headers, $data

# output on screen if you like
$newCsv | Format-Table -AutoSize

# save to a new file and use that for writing to the sql table
$newCsv | Export-Csv -Path ".\Fact2.csv" -NoTypeInformation
,(Import-Csv -Path ".\Fact2.csv") | Write-SqlTableData -ServerInstance "server.db.com" -DatabaseName "TestDB" -SchemaName "dbo" -TableName "Fact" -Force

This is what the CSV will look like after cleaning up:

"BGSC","MYTrim Code","CC","OC","Ctry Of Prod","Report Year","Scenario","Volume","Gross Revenue","SA","ANSR","TM","TL","P and Warranty, others","CM","Total Variable","Variable Profit","EBIT"
"TOTAL_CKD","TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM","COS_AU","4F80","","2015","Current","0","0","0","0","0","0","0","0","0","0","136748"
"CXXXXXXXXXX","TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM","COS_AU","4F80","","2013","Current","0","0","0","0","0","0","0","0","0","0","1647"
"CXXXXXXXXXX","TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM","COS_AU","4F80","","2014","Current","0","16098","0","16098","1","0","8","16090","0","16090","61219"
"CXXXXXXXXXX","TOTAL_MODEL_YEAR_AND_GLOBAL_TRIM","COS_AU","4F80","","2015","Current","0","12496","0","12496","13386","0","0","890","0","890","192"

and there should be no problem converting these values to type 'Real'

A CSV file does not determine a value type. It only has headers and data fields. When imported, all data is of type String. The Write-SqlTableData cmdlet (tries to) convert these values into the data type as defined in the table

Theo
  • 57,719
  • 8
  • 24
  • 41
  • does write-sqltabledata really convert to relevant data types? i wish to say it works, but i am getting the same error even after the cleanup happened: `Write-SqlTableData : The given value of type String from the data source cannot be converted to type real of the specified target column.` – Cataster May 11 '20 at 19:00
  • @Cataster I can't of course check that.. Does your table really contain a column `P and Warranty, others` ? – Theo May 11 '20 at 19:20