0

I apologize for the terrible question title; I honestly don't know how to phrase this question well. If anyone has any ideas, I'd love to change it to a more informative title.

I have a very large CSV, and I've been writing a script to pull chunks out of it to import into SQL Server. One of my functions is showing some strange behavior; here's the relevant code.

#<main script>
$Global:rowDelimiter = "`r`n"
$Global:nextBlockStart = ''
.
.
.
#copy rows from main CSV file
$chunkToImport = Get-ImportByBlock -blockSize $blockSize
$rowsInChunk = ([regex]::Matches($chunkToImport,"$rowDelimiter")).count
function Get-ImportByBlock {
    param(
        [int]$blockSize
    )
    process{
        Write-Host "Reading a block of $blockSize characters from csv..."
        $buffer = [char[]]::new($blockSize)
        $streamReader.ReadBlock($buffer,0,$blockSize)
        $bufferString = $buffer -join ''
        $streamReader.DiscardBufferedData()

        $Global:lastRowEndLoc = $bufferString.LastIndexOf("$rowDelimiter")
        If ($lastRowEndLoc -le 0) {
            Write-Host "Error: No complete rows found."
            $lastRowEndLoc = 0
        }
        $completeRows = $nextBlockStart + $($bufferString[0..$lastRowEndLoc] -join '')
        Write-Host "Block of $blockSize characters read. Last complete row of data ends at $lastRowEndLoc"

        $nextBlockStartStartLoc = $lastRowEndLoc + 2
        $nextBlockStartEndLoc = $bufferString.Length
        $nextBlockStart = $bufferString[$nextBlockStartStartLoc..$nextBlockStartEndLoc] -join ''
        Write-Host "Next block will start with the with the sub-block at locations $nextBlockStartStartLoc to $nextBlockStartEndLoc"

        Write-Host "Rows returned: $completeRows" -ForegroundColor DarkCyan
        Return $completeRows
    }
}

Let's say in this runthrough, $blockSize = 17365, using the following data:

ID | Name    | Color
A  | Alice   | Red
B  | Bob     | Purple
C  | Cindy   | Green

When I run this code, it behaves like this:

  • The Get-ImportByBlock function is called using $blockSize
  • After $completeRows = $nextBlockStart + $($bufferString[0..$lastRowEndLoc] -join '') is executed, $completeRows looks like this:
A,Alice,Red
B,Bob,Purple
C,Cindy,Green
  • When the code is on the line Write-Host "Rows returned: $completeRows" -ForegroundColor DarkCyan, $completeRows still looks the same way
  • The function returns, and before $rowsInChunk = ([regex]::Matches($chunkToImport,"$rowDelimiter")).count executes, $chunkToImport looks like this:
17365
A,Alice,Red
B,Bob,Purple
C,Cindy,Green

Where did that 17365 come from? Should I be calling Get-ImportByBlock using a different syntax?

Inflorescence
  • 86
  • 1
  • 7
  • 2
    It comes from `$streamReader.ReadBlock(...)`, change that statement to `[void]$streamReader.ReadBlock(...)` to suppress the extra output – Mathias R. Jessen Nov 02 '22 at 15:34
  • 3
    What you're seeing is the return value of `StreamReader.ReadBlock`, which returns the number of characters read -- and which you should not ignore if you want correct logic, because it's legal for this value to be *less* than what you asked for, thus giving you a bunch of null characters in your string if you're not careful. Any value evaluated inside a PowerShell function that's not assigned becomes part of the pipeline. `return` is there to prematurely end things, but it's not special in terms of actually producing a value. – Jeroen Mostert Nov 02 '22 at 15:35
  • Thank you so much for the answers! I can definitely see that pipeline/return information coming up again, so that is a huge piece of information to learn. Thank you! – Inflorescence Nov 02 '22 at 19:34

0 Answers0