1

I came across a post discussing how to use Powershell to bulk import massive data relatively fast. I have a typical csv file with about 5 million rows formatted in the usual way.

I keep getting the same error messages regardless if I choose to import a txt or csv file. Playing around with the csvdelimiter/firstcolumnnames section also created their own issues.

I've spent hours trying to figure out how to get it to work with MY csv files and I keep getting the same error messages no matter what I try. All field names accept Null and they are identical in every way between the table and csv file. I do not have a primary key for the database.

# Database variables
$sqlserver = "SERVERNAMEHERE"
$database = "autos"
$table = "AgedAutos"

# CSV variables
$csvfile = "C:\temp\aged.csv"
$csvdelimiter = "',"
$firstRowColumnNames = $true

################### No need to modify anything below ###################
Write-Host "Script started..."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew() 
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")

# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000

# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize

# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable

# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }

foreach ($column in $columns) { 
    $null = $datatable.Columns.Add()
}

# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null)  {
    $null = $datatable.Rows.Add($line.Split($csvdelimiter))
    $i++; if (($i % $batchsize) -eq 1) { 
        $bulkcopy.WriteToServer($datatable) 
        Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
        $datatable.Clear() 
    } 
} 

# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
         $bulkcopy.WriteToServer($datatable)
         $datatable.Clear()
}

# Clean Up
$reader.Close(); $reader.Dispose()
$bulkcopy.Close(); $bulkcopy.Dispose()
$datatable.Dispose()

Write-Host "Script complete. $i rows have been inserted into the database."
Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()

Error message listed below.

Exception calling "WriteToServer" with "1" argument(s): "The given value of type String from the data source cannot be converted to 
type date of the specified target column."
At C:\powershell_scripts\batch_csv_import-code1-working-test for auto table.ps1:43 char:3
+         $bulkcopy.WriteToServer($datatable)
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : InvalidOperationException

340000 rows have been inserted in 00:00:03.5156162

I have no idea what that error means since I cannot find anything useful on Google. I'm thinking one of the columns might be listed incorrectly in SQL Server, but I could be wrong.

Please help me figure out the problem. Thanks.

J.R.
  • 59
  • 1
  • 9
  • I would think SSIS would be the best tool for this. Importing a CSV into a database table is ETL 101 and SSIS performance is very good for this type of task. – Wes H Jan 18 '18 at 14:43
  • 5
    Somewhere a string that maps to a date field has an invalid value. – David Browne - Microsoft Jan 18 '18 at 15:07
  • Good news is that I found the issue. One of the columns was not abelled correctly. Simple tweak and now it works great. The bad new is that all of the data is being imported into the first column instead of being split up to the different columns in my database. Does anybody know what command I might be able to use with the aforementioned code to ignore ANY commas found in the individual cells? Only thing I found here was from a few years ago and I couldn't get it work. [link](https://stackoverflow.com/questions/15927291/how-to-split-a-string-by-comma-ignoring-comma-in-double-quotes) – J.R. Jan 18 '18 at 17:13
  • @J.R. - regex is the only "built in" way to parse true CSV in .NET. If somehow this does not work or is too slow, try these libraries: https://stackoverflow.com/questions/1941392/are-there-any-csv-readers-writer-libraries-in-c – Alex Jan 19 '18 at 02:23

1 Answers1

0

You are getting all the data in the first column because your value for $csvdelimiter is incorrect. you have: $csvdelimiter = "'," it should be: $csvdelimiter = ","