I apologize for the terrible question title; I honestly don't know how to phrase this question well. If anyone has any ideas, I'd love to change it to a more informative title.
I have a very large CSV, and I've been writing a script to pull chunks out of it to import into SQL Server. One of my functions is showing some strange behavior; here's the relevant code.
#<main script>
$Global:rowDelimiter = "`r`n"
$Global:nextBlockStart = ''
.
.
.
#copy rows from main CSV file
$chunkToImport = Get-ImportByBlock -blockSize $blockSize
$rowsInChunk = ([regex]::Matches($chunkToImport,"$rowDelimiter")).count
function Get-ImportByBlock {
param(
[int]$blockSize
)
process{
Write-Host "Reading a block of $blockSize characters from csv..."
$buffer = [char[]]::new($blockSize)
$streamReader.ReadBlock($buffer,0,$blockSize)
$bufferString = $buffer -join ''
$streamReader.DiscardBufferedData()
$Global:lastRowEndLoc = $bufferString.LastIndexOf("$rowDelimiter")
If ($lastRowEndLoc -le 0) {
Write-Host "Error: No complete rows found."
$lastRowEndLoc = 0
}
$completeRows = $nextBlockStart + $($bufferString[0..$lastRowEndLoc] -join '')
Write-Host "Block of $blockSize characters read. Last complete row of data ends at $lastRowEndLoc"
$nextBlockStartStartLoc = $lastRowEndLoc + 2
$nextBlockStartEndLoc = $bufferString.Length
$nextBlockStart = $bufferString[$nextBlockStartStartLoc..$nextBlockStartEndLoc] -join ''
Write-Host "Next block will start with the with the sub-block at locations $nextBlockStartStartLoc to $nextBlockStartEndLoc"
Write-Host "Rows returned: $completeRows" -ForegroundColor DarkCyan
Return $completeRows
}
}
Let's say in this runthrough, $blockSize = 17365
, using the following data:
ID | Name | Color
A | Alice | Red
B | Bob | Purple
C | Cindy | Green
When I run this code, it behaves like this:
- The
Get-ImportByBlock
function is called using$blockSize
- After
$completeRows = $nextBlockStart + $($bufferString[0..$lastRowEndLoc] -join '')
is executed,$completeRows
looks like this:
A,Alice,Red
B,Bob,Purple
C,Cindy,Green
- When the code is on the line
Write-Host "Rows returned: $completeRows" -ForegroundColor DarkCyan
,$completeRows
still looks the same way - The function returns, and before
$rowsInChunk = ([regex]::Matches($chunkToImport,"$rowDelimiter")).count
executes,$chunkToImport
looks like this:
17365
A,Alice,Red
B,Bob,Purple
C,Cindy,Green
Where did that 17365 come from? Should I be calling Get-ImportByBlock
using a different syntax?