3

The following Powershell command fails to copy the entire file; a few characters are always missing from the end.

[System.IO.StreamWriter]::new('C:\TEMP\b.csv', [System.Text.Encoding]::UTF8).Write([System.IO.StreamReader]::new('C:\Temp\a.csv', [System.Text.Encoding]::GetEncoding('iso-8859-1')).ReadToEnd())

I suspect it's because the writer doesn't flush the last bits because this does copy the entire file:

$X = [System.IO.StreamReader]::new('C:\Temp\a.csv', [System.Text.Encoding]::GetEncoding('iso-8859-1'))
$Y = [System.IO.StreamWriter]::new('C:\TEMP\b.csv', [System.Text.Encoding]::UTF8)
$Y.Write($X.ReadAll())
$X.Dispose()
$Y.Dispose()

Is it possible to dispose of (and flush) the reader & writer without having created variables to reference them?

EDIT: I tried this one-liner using streamreader/writer hoping the reader's read buffer would directly transfer to the writer's write buffer rather than waiting for the reader to read the entire file into memory and then write. What technique might achieve that?

I personally find code that does not declare a single-use object to often be cleaner / more succinct, but my focus is on understanding whether/how objects dispose of themselves, not the style.

There's no need to eschew variables or write on one line, but this behaviour isn't what I expected. In VBA one can copy a file like so and trust it will dispose of itself properly without having to declare a variable and explicitly flush (I think).

Sub Cpy()
With New Scripting.FileSystemObject
    .CreateTextFile("c:\Temp\Out.txt").Write .OpenTextFile("C:\Temp\In.txt", ForReading).ReadAll
End With
End Sub

One can achieve similar behaviour in a custom VBA class by writing appropriate 'clean-up' code in a Class_Terminate() procedure. I assumed the Streamwriter would similarly flush data upon termination via garbage collection once the line executes and there's no longer a variable associated with it.

I also noticed that the file remains locked and I cannot delete it until I close the powershell session. Is there a way to flush contents and release the file without having declared a variable to work with?

mklement0
  • 382,024
  • 64
  • 607
  • 775
alazyworkaholic
  • 537
  • 2
  • 8
  • 2
    Why is it important to not use variables? – Mathias R. Jessen Jan 15 '22 at 15:05
  • 1
    Is there a specific need to use `StreamReader` and `StreamWriter` ? This should work well using the static methods of `System.IO.File` – Santiago Squarzon Jan 15 '22 at 15:16
  • 1
    Why are you trying to do it in one line? In 10 years time and over 10k PS scripts I haven't found a single realistic use case where I was forced to do things in a single line. Even when typing in the console you can open a newline in Powershell. Restricting yourself to a single line overcomplicates your code in most cases. – bluuf Jan 15 '22 at 18:15

2 Answers2

4

Just to show you that this is possible, and easier to do, using the static methods of System.IO.File, WriteAllText() and ReadAllText().

The following queries the https://loripsum.net/ API to get random paragraphs and writes to a file using the iso-8859-1 encoding. Then reads that file and writes a copy using the same encoding and lastly compares both file hashes. As you can see reading and writing is all done as a one-liner.

The using statements can be removed but you would need to use the Fully Qualified Type Names.

Set location to a temporary folder for testing.

using namespace System.IO
using namespace System.Text

$fileRead  = [Path]::Combine($pwd.Path, 'test.txt')
$fileWrite = [Path]::Combine($pwd.Path, 'test-copy.txt')
$content   = Invoke-RestMethod 'https://loripsum.net/api/5/short/headers/plaintext'
$encoding  = [Encoding]::GetEncoding('iso-8859-1')

[File]::WriteAllText($fileRead, $content, $encoding)
[File]::WriteAllText($fileWrite, [File]::ReadAllText($fileRead, $encoding), $encoding)

(Get-FileHash $fileRead).Hash -eq (Get-FileHash $fileWrite).Hash # => Should be True

$fileRead, $fileWrite | Remove-Item
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • Part of the task is to convert the file's encoding from iso-8859-1 to UTF8 so the the files won't be identical. I'm also working with an extremely slow network shared drive (tops out at double-digit KB/s, it's a low-priority known issue out of my control) so I was hoping the streamreader/writer would speed up the task by reading/writing at the same time. – alazyworkaholic Jan 15 '22 at 22:15
  • 1
    @alazyworkaholic your initial script is not "reading and writing" at the same time and, this method is faster than `StreamReader` / `StreamWriter`. This method also can handle reading the file using one encoding and writing the file using a different encoding. – Santiago Squarzon Jan 16 '22 at 07:52
3
  • For the specific use case given, Santiago Squarzon's helpful answer is indeed the best solution: using the static methods of the static System.IO.File class obviates the need for instances representing files that require calling a .Close() method or explicit disposing of.

    • To read lazily and therefore support overlapping reading and writing, line by line, you can use the static [System.IO.File]::ReadLines() and [System.IO.File]::WriteAllLines() methods, but note that this approach (a) invariably uses platform-native [Environment]::NewLine-format newlines in the output file, irrespective of what newline format the input file uses, and (b) invariably adds a trailing newline in this format, even if the input file had no trailing newline.

    • Overcoming these limitations would require use of a lower-level, raw-byte API, System.IO.FileStream - which again requires explicit disposal (see bottom section).

  • Given that your approach reads the entire input file into memory first and then writes, you could even make do with PowerShell cmdlets, assuming you're running PowerShell (Core) 7+, which writes BOM-less UTF-8 files by default, and whose -Encoding parameter accepts any supported encoding, such as ISO-8859-1 in your case:

    # PowerShell (Core) 7+ only
    Get-Content -Raw -Encoding iso-8859-1 C:\TEMP\a.csv |
      Set-Content -NoNewLine C:\TEMP\b.csv                            
    

As for your general question:

As of PowerShell (Core) 7.2.1:

  • PowerShell has no construct equivalent to C#'s using statement that allows automatic disposing of objects whose type implements the System.IDisposable interface (which, in the case of file I/O APIs, implicitly closes the files).

    • GitHub issue #9886 discusses adding such a statement, but the discussion suggests that it likely won't be implemented.

    • Note: While PowerShell does have a family of statements starting with keyword using, they serve different purposes - see the conceptual about_Using help topic.

  • A future PowerShell version will support a clean { ... } (or cleanup { ... }) block that is automatically called when an advanced function or script terminates, which allows performing any necessary function-script-level cleanup (disposing of objects) - see RFC #294.

It is up to each type implementing the IDisposable interface whether it calls the .Dispose() methods from the finalizer. Only if so is an object automatically disposed of eventually, by the garbage collector.

For System.IO.StreamWriter and also the lower-level System.IO.FileStream class, this appears not to be the case, so in PowerShell you must call .Close() or .Dispose() explicitly, which is best done from the finally block of a try / catch / finally statement.

You can cut down on the ceremony somewhat by combining the aspects of object construction and variable assignment, but a robust idiom still requires a lot of ceremony:

$x = $y = $null
try {
  ($y = [System.IO.StreamWriter]::new('C:\TEMP\b.csv', [System.Text.Encoding]::UTF8)).
    Write(
      ($x = [System.IO.StreamReader]::new('C:\Temp\a.csv', [System.Text.Encoding]::GetEncoding('iso-8859-1'))).
        ReadToEnd()
    )
} finally {
  if ($x) { $x.Dispose() }
  if ($y) { $y.Dispose() }
}

A helper function, Use-Object (source code below) can alleviate this a bit:

Use-Object 
  ([System.IO.StreamReader]::new('C:\Temp\a.csv',[System.Text.Encoding]::GetEncoding('iso-8859-1'))), 
  ([System.IO.StreamWriter]::new('C:\TEMP\b.csv', [System.Text.Encoding]::UTF8)) `
  { $_[1].Write($_[0].ReadToEnd()) }

Note how the disable objects passed as the first argument are referenced via $_ as an array in the script-block argument (as usual you may use $PSItem in lieu of $_).

A more readable alternative:

Use-Object 
  ([System.IO.StreamReader]::new('C:\Temp\a.csv',[System.Text.Encoding]::GetEncoding('iso-8859-1'))), 
  ([System.IO.StreamWriter]::new('C:\TEMP\b.csv', [System.Text.Encoding]::UTF8)) `
  { 
    $reader, $writer = $_
    $writer.Write($reader.ReadToEnd()) 
  }

Or, perhaps even better, albeit with slightly different semantics (which will rarely matter),[1] as Darin suggests:

Use-Object 
  ($reader = [System.IO.StreamReader]::new('C:\Temp\a.csv',[System.Text.Encoding]::GetEncoding('iso-8859-1'))), 
  ($writer = [System.IO.StreamWriter]::new('C:\TEMP\b.csv', [System.Text.Encoding]::UTF8)) `
  { 
    $writer.Write($reader.ReadToEnd()) 
  }

Use-Object source code:

function Use-Object {
  param( 
    [Parameter(Mandatory)] $ObjectsToDispose,  # a single object or array
    [Parameter(Mandatory)] [scriptblock] $ScriptBlock
  )

  try {
    ForEach-Object $ScriptBlock -InputObject $ObjectsToDispose
  }
  finally {
    foreach ($o in $ObjectsToDispose) {
      if ($o -is [System.IDisposable]) {
        $o.Dispose()
      }
    }
  }
  
}

[1] With this syntax, you're creating the variables in the caller's scope, not in the function's, but this won't matter, as long as you don't try to assign different objects to these variables with the intent of also making the caller see such changes. (If you tried that, you would create a function-local copy of the variable that the caller won't see) - see this answer for details.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    The cleanup block sounds really cool, hope it's actually implemented – Santiago Squarzon Jan 17 '22 at 00:39
  • @mklement0, I wrote a PSCmdlet in C# named `Remove-Object`, the pipeline receives iDisposable, ComObject, and also classes/objects by looking for methods in the order of Dispose(), dispose(), Close(), close(), and if failing that it then does nothing - meaning that anything can be in the pipeline without throwing errors. So, is there a way to hand this code over to someone to approve of / reject, debug, add/remove features and then push forward some derived version of it into PSGallery as vexx32 mentioned in [#9886](https://github.com/PowerShell/PowerShell/issues/9886)? – Darin Oct 30 '22 at 03:41
  • Also, it isn't 100% convential, or at least I don't think any other CmdLet accepts multiple objects in the pipeline and tosses them all at once to each scriptblock as $_0 (same as $_), $_1, $_2, etc... – Darin Oct 30 '22 at 03:41
  • Example use: `New-Object -ComObject WScript.Shell | Remove-Object {$null = $_.Popup('Message')}` – Darin Oct 30 '22 at 03:42
  • Example use: `"File content" | Out-File "$PSScriptRoot\TextFile.txt"; (New-Object System.IO.StreamReader("$PSScriptRoot\TextFile.txt")), 'File: ', (New-Object -ComObject WScript.Shell) | Remove-Object { while(-not $_0.EndOfStream) { $Line = $_0.ReadLine(); $null = $_2.Popup($_1 + $Line) } }` – Darin Oct 30 '22 at 03:42
  • @Darin, there's no approval process for publishing modules to the PS gallery, but you could ask for feedback in the GitHub issue you linked to, and perhaps also on https://codereview.stackexchange.com/, and then publish your code yourself. – mklement0 Oct 30 '22 at 07:02
  • @Darin, as for your non-conventional way of handling the input (`$_0`, ...): Please see my updated `Use-Object` function, which now allows using `$_` to access the _array_ of disposable objects passed as the first positional argument (`-ObjectsToDispose`), i.e. `$_[0]`, ... – mklement0 Oct 30 '22 at 07:05
  • @Darin, personally, I wouldn't use the verb `Remove` for your cmdlet, as that could be confusing. Similarly, accepting the objects to dispose via the pipeline could be confusing when combined with using `$_` _as an array_ - at least the cmdlet should _also_ support passing the objects as an array _argument_, the way that `Use-Object` does. – mklement0 Oct 30 '22 at 07:09
  • 1
    @Darin, but I like your approach, so I've added it to the answer, along with a footnote explaining the hypothetical caveat. – mklement0 Feb 27 '23 at 16:00