3
> cat .\foo.txt
abc

> cat .\foo.txt | md5sum
c13b6afecf97ea6b38d21a8f5167fa1e *-

> md5sum foo.txt
b79545611b3be30f90a0d21ef69bca82 *foo.txt

cat and md5sum are the unix port (from the Windows Git distribution).

This is a toy example for my real use case which is piping of a binary data to a legacy python script that I can't change. Because of the pipe doing encoding, the binary file becomes corrupted.

I tried changing $OutputEncoding, [Console]::OutputEncoding and using chcp, all didn't help (but maybe I was not doing it right, this is all very convoluted...).

The utility in PowerShell's pipe adds linefeed doesn't work for me because of how it handles the process arguments (I need to pass some argument to the legacy script and some need to be quoted, but the utility accepts all arguments as one string)

The optimal solution for me to somehow tell powershell to turn off encoding completely and just behave as unix/cmd.

Community
  • 1
  • 1
IttayD
  • 28,271
  • 28
  • 124
  • 178
  • 1
    Don't use pipe and specify the file as a parameter of md5sum? Or use the non-object pipe: `cmd /c cat .\foo.txt "|" md5sum` - note the quotes around the pipe symbol. – wOxxOm Nov 08 '16 at 05:45
  • I wrote that I gave a toy example and my actual use case is a script that must accept input from stdin and that I can't change it. I've tried cmd /c, but it had issues with qouting, l'll give it another try – IttayD Nov 08 '16 at 06:33
  • @wOxxOm: it finally worked with cmd, but I wish I could use PS – IttayD Nov 08 '16 at 07:50
  • For pure-PS you can calculate md5 using built-in .Net classes or implement binary pipeline manually: [Output binary data on powershell pipeline](//stackoverflow.com/a/24745250) – wOxxOm Nov 08 '16 at 09:46
  • @wOxxOm: post your first comment as an answer. md5 was used to show the changes, i'm using another program through a pipe – IttayD Nov 28 '16 at 08:08

2 Answers2

2

There is no way around it, except to use cmd to run the commands including the pipe:

cmd /c cat.exe .\foo.txt "|" md5sum

Note the pipe character is quoted, so it is interpreted by cmd and not powershell.

IttayD
  • 28,271
  • 28
  • 124
  • 178
1

If you're using the Get-Content cmdlet, then follow the recommendation given at https://technet.microsoft.com/en-us/library/hh847788.aspx for dealing with binary data:

When reading from and writing to binary files, use a value of Byte for the Encoding dynamic parameter and a value of 0 for the ReadCount parameter.

Regardless of whether or not you're using Get-Content, you'll probably want to avoid ever having your data represented as a String. The String type is designed for character data, and doesn't do well for handling binary data.

Tanner Swett
  • 3,241
  • 1
  • 26
  • 32
  • I'm pretty sure it is the pipe that causing all the problems. I've used the unix cat, but tried the powershell one and there are still different results – IttayD Nov 08 '16 at 07:49