3

I'm exporting about 1,000,000 rows via an Invoke-Sqlcmd script and it exports into a csv around 180MB.

What I would like to do is be able to export it instead into 10 100,000 row files. How do I go about doing this as it doesn't seem to be a feature in Export-Csv or Invoke-Sqlcmd that I can find.

Current code:

$dataSubset = "mydata"
$localFile = ("{0}.csv" -f $dataSubset)
$localPath = "my\path"
$serverInstance = "myserver.domain.com"
$database = "mydatabase"

$Query = @"
my crazy ass query
"@
$Results = Invoke-Sqlcmd -QueryTimeout 0 -ServerInstance $ServerInstance -Database $database -Query $Query
$Results | Export-csv $localPath/$localFile -NoTypeInformation

Instead of having a single mydata.csv I want to have mydata_1.csv, mydata_2.csv, etc.

Nico Nekoru
  • 2,840
  • 2
  • 17
  • 38
whoisearth
  • 4,080
  • 13
  • 62
  • 130

3 Answers3

2

Use the Select-Object cmdlet (whose built-in alias is select), it has -Skip and -First parameters:

for ($i=0; $i -lt 10; $i++) {
    $localFile = ("{0}_{1}.csv" -f $dataSubset, $i)
    $Results | Select -Skip ($i * 100000) -First 100000 | Export-Csv $localPath/$localFile -NoTypeInformation
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
Mark Arend
  • 164
  • 9
  • 2
    Good pointers, though you probably don't want to send the entire 1-million objects array through the pipeline for each and every batch; ideally, you'd avoid collecting all 1 million objects in memory to begin with. – mklement0 May 21 '20 at 22:32
2

Unfortunately, as of PowerShell 7.0, there is no batching (chunking) mechanism in PowerShell.

  • This GitHub issue suggests adding a -ReadCount parameter to Select-Object to enable batching.

For now, you'll have to implement your own:

# ... 

$batchSize = 10000
$fileNdx = 0

# The list to hold a batch.
$batch = [Collections.Generic.List[object]]::new($batchSize)

# The script block for exporting a single batch to a CSV.
# Exports to .../mydata_1.csv, .../mydata_2.csv, ... 
$sb = { 
        ++$fileNdx; 
        $batch | Export-Csv -NoTypeInformation "$localPath/mydata_$fileNdx.csv" 
      }

Invoke-Sqlcmd -QueryTimeout 0 -ServerInstance $ServerInstance -Database $database -Query $Query |
  ForEach-Object -Process {
    $batch.Add($_) # add object to current batch
    if ($batch.Count -eq $batchSize) { # batch is complete
      . $sb # export
      $batch.Clear() # start new batch
    }
  } -End { 
    if ($batch.Count) { # final, incomplete batch
      . $sb # export
    }
  }

Note that you should generally avoid $Results = Invoke-Sqlcmd ... with large queries, because it invariably collects the entire result set in memory as a whole - instead, use the pipeline, with its object-by-object processing, as shown above.
That said, if you have enough memory, capturing all input at once can situationally improve performance.

mklement0
  • 382,024
  • 64
  • 607
  • 775
0

I dont think that powershell is the way to do this. Cant you export directy from the SQL Server via cmd integration?

Anyway, concerning your question, another Alterantive is:

$length = $Results.length


$items = 10;
$stepwitch = [Math]::Ceiling($length / $items)

for($i=1;$i -le $items; $i++) {
$firstItem = ($i - 1) * $stepwitch 
$lastItem = ($i * $stepwitch) - 1
if ($lastItem -gt $length) {$lastItem = $length}
echo ("FI: " + $firstItem + " LI:"  + $lastItem)
$localFile = ("{($i)}.csv" -f $dataSubset)
$Results[$firstItem..$lastItem] | Export-csv $localPath/$localFile -NoTypeInformation
}

(not tested, but at the end similiar to Mark Arends solution)

Another alterantive would maybe to use the .net method system.array.copy https://learn.microsoft.com/de-de/dotnet/api/system.array.copy?view=netcore-3.1

Farbkreis
  • 604
  • 3
  • 12