10

I have a banking application script that generates a “filtered” output file by removing error records from a daily input bank file (see How do I create a Windows Server script to remove error records, AND the previous record to each, from a file with the results written to a NEW file). The “filtered” output file will be sent to the State for updating their system. As a side note, the original input files that we receive from the bank show as Unix 1252 (ANSI Latin 1) in my file editor (UltraEdit), and each record ends only with a line feed.

I sent a couple of test output files generated from both “clean” (no errors) and “dirty” (contained 4 errors) input files to the State for testing on their end to make sure all was good before implementation, but was a little concerned because the output files were generated in UTF-16 encoding with CRLF line endings, where the input and current unfiltered output are encoded in Windows-1252. All other output files on this system are Windows-1252 encoded.

Sure enough… I got word back that the encoding is incorrect for the state’s system. Their comments were: “The file was encoded UCS-2 Little Endian and needed to be converted to ANSI to run on our system. That was unexpected.

After that the file with no detail transactions would run through our EFT rejects program ok.

It seems that it was processed ok, but we had to do some conversion. Can it be sent in ANSI or needs to be done in UCS 2 Little Endian?”

I have tried unsuccessfully adding –Encoding “Windows-1252” and –Encoding windows-1252 to my out-file statement, with both returning the message: Out-File : Cannot validate argument on parameter 'Encoding'. The argument "Windows-1252" does not belong to the set "unknown,string,unicode,bigendianunicode,utf8,utf7,utf32,ascii,default,oem" specified by the ValidateSet attribute. Supply an argument that is in the set and then try the command again. At C:\EZTRIEVE\PwrShell\TEST2_FilterR02.ps1:47 char:57 + ... OutputStrings | Out-File $OutputFileFiltered -Encoding "Windows-1252" + ~~~~~~~~~~~~~~ + CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingVal idationException + FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.Power Shell.Commands.OutFileCommand

I’ve looked high and low for some help with this for days, but nothing is really clear, and the vast majority of what I found, involved converting FROM Windows-1252 TO another encoding. Yesterday, I found a comment somewhere on stackoverflow that “ANSI” is the same as Windows-1252, but so far, I have not found anything that shows me how to properly append the Windows-1252 encoding option to my out-file statement so Powershell will accepted it. I really need to get this project finished so I can tackle the next several that have been added to my queue. Is there possibly a subparameter that I’m missing that needs to be appended to –Encoding?

This is being tested under Dollar Universe (job scheduler) on a new backup server running Windows Server 2016 Standard with Powershell 5.1. Our production system runs Dollar Universe on Windows Server 2012 R2, also with Powershell 5.1 (yes, we are looking for a sufficient upgrade window :-)

As of my last attempt, my Powershell script is :

 [cmdletbinding()]
 Param
 (
     [string] $InputFilePath
 )   

 # Read the text file
 $InputFile = Get-Content $InputFilePath

# Initialize output record counter
$Inrecs = 0
$Outrecs = 0

# Get the time
$Time = Get-Date -Format "MM_dd_yy"

# Set up the output file name
$OutputFileFiltered = "C:\EZTRIEVE\CFIS\DATA\TEST_CFI_EFT_RETURN_FILTERED"

# Initialize the variable used to hold the output
$OutputStrings = @()

# Loop through each line in the file
# Check the line ahead for "R02" and add it to the output
# or skip it appropriately
for ($i = 0; $i -lt $InputFile.Length - 1; $i++)
{
    if ($InputFile[$i + 1] -notmatch "R02")
    {
        # The next record does not contain "R02", increment count and add it to the output
        $Outrecs++
        $OutputStrings += $InputFile[$i]
    }
    else
    {
        # The next record does contain "R02", skip it
        $i++
    }
}

# Add the trailer record to the output
$OutputString += $InputFile[$InputFile.Length - 1]

# Write the output to a file
# $OutputStrings | Out-File $OutputFileFiltered
$OutputStrings | Out-File $OutputFileFiltered -Encoding windows-1252

# Display record processing stats:

$Filtered = $Outrecs-$i

Write-Host $i  Input records processed

Write-Host $Filtered  Error records filtered out

Write-Host $Outrecs  Output records written
K9-Guy
  • 105
  • 1
  • 1
  • 8
  • I remember during my searches, there was a comment on a problem, somewhat related to my issue, about changing the encoding for the console output would affect Powershell's output... – K9-Guy Mar 20 '19 at 16:10
  • I don't think there is such thing like a Unix CP252 and a Windows CP252, rather the format is the same, just a CP252 and on the first one the row are separated by using only a linefeed rather than the usual carriageretrun+linefeed. So to "convert" you need to replace the special character \r\n with \n What is happening if you apply a $OutputStrings.ToString().Replace("`n","`r`n") | Out-File $OutputFileFiltered Eventually would be easier to get help if you include a ready-to-go runnable example of powershell script. – A. Lion Mar 20 '19 at 16:34
  • @AdminOfThings: That would only work if the strings happened not to contain any characters outside the 7-bit ASCII range; any 8-bit ANSI chars. (e.g., accented chars. such as `ü`) would transliterate to _literal_ `?` chars., resulting in information loss. – mklement0 Mar 20 '19 at 18:02
  • @K9-Guy: The _console's_ encoding doesn't come into play here, only PowerShell's default file-output encoding, which you can change with `-Encoding`. If you happen to be on a system that uses Windows-1252 as the active ANSI code page, just use `-Encoding Default`; if not, more work is needed - see my answer. – mklement0 Mar 20 '19 at 18:05
  • @P.Lion: You're right, there's only one [Windows-1252 code page](https://en.wikipedia.org/wiki/Windows-1252). However, Unix-style LF-only newlines are _not_ a problem here (they rarely are in PowerShell, which equally recognizes LF and CRLF newlines): The newlines - whether LF-only or CRLF - are _stripped_ when `Get-Content` returns the input file's lines _as an array_. On later output with `Out-File` (or `Set-Content`, ...), individual strings are joined with the _platform-appropriate_ newline sequence, which on Windows means you'll end up with CRLF-newline files. – mklement0 Mar 20 '19 at 18:12
  • @AdminOfThings: -Encoding ASCII gives me output with CRLF line endings - I need LF only at line end. – K9-Guy Mar 20 '19 at 18:47
  • @P. Lion: Being a PS Newbie, I'm not sure I follow your "So to convert..." – K9-Guy Mar 20 '19 at 18:48
  • @mklement0: Sorry for delay in post - being pulled different directions w/so many projects in progress. Using $OutputStrings | Out-File $OutputFileFiltered -Encoding Default resulted in a DOS 1252 file with CRLF line endings. I then had a chat with a coworker and he suggested running from a batch file outside of the job scheduler to see what happens - just in case there may be something funny happening during Uproc execution. (continuation follows) – K9-Guy Mar 21 '19 at 23:23
  • @mklement: I created a .bat file to execute my script, then ran it from a command prompt as Administrator with $OutputStrings | Out-File -Encoding Default $OutputFileFiltered in the script. File output was also DOS 1252 with CRLF line endings. Having been a mainframe guy for decades, I need to figure out how to run PowerShell Core... and maybe .Net if that doesn't give the results I need. – K9-Guy Mar 21 '19 at 23:39
  • @K9-Guy: Yes, `Out-File` (and `Set-Content`) on Windows will give you CRLF newlines, invariably. PowerShell Core exhibits the same behavior, because the newline behavior is tied to the OS, not to the PowerShell edition. The bottom section of my answer shows you how to unconditionally create LF files. Have you tried that, and did something not work? – mklement0 Mar 22 '19 at 01:24
  • @K9-Guy You can see it easily if you open a Linux-CP252 file with notepad++ . Once opened use CTRL+F then select the Replace tab and fill it as following: search: \n replace with: \r\n and put the research type on "Extended (\n, \r.. ) This will change each line interruption from a LineFeed only to a CRLF, which is basically the difference in format between a linux like text file and a windows like one, assuming they both use the same encoding type. – A. Lion Mar 22 '19 at 08:36

1 Answers1

14

Note:

  • You later clarified that you need LF (Unix-format) newlines - see the bottom section.

  • The next section deals with the question as originally asked and presents solutions that result in files with CRLF (Windows-format) newlines when run on Windows.


If your system's Language for non-Unicode programs setting (a.k.a. the system locale) happens to have Windows-1252 as the active ANSI code page (e.g, on US-English or Western European systems), use -Encoding Default, because Default refers to that code page in Windows PowerShell (but not in PowerShell Core, which defaults to BOM-less UTF-8 and doesn't support the Default encoding identifier).

Verify with: (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP) -eq '1252'

... | Out-File -Encoding Default $file

Note:

  • If you are certain that your data is actually composed exclusively of ASCII-range characters (characters with code points in the 7-bit range, which excludes accented characters such as ü), -Encoding Default will work even if your system locale uses an ANSI code page other than Windows-1252, given that all (single-byte) ANSI code pages share all ASCII characters in their 7-bit subrange; you could then also use -Encoding ASCII, but note that if there are non-ASCII characters present after all, they will be transliterated to literal ? chars., resulting in loss of information.

  • The Set-Content cmdlet actually defaults to the Default encoding in Windows PowerShell (but not PowerShell Core, where the consistent default is UTF-8 without BOM).

  • While Set-Content's stringification behavior differs from that of Out-File - see this answer - it's actually the better choice if the objects to write to the file already are strings.


Otherwise, you have two options:

  • Use the .NET Framework file I/O functionality directly, where you can use any encoding supported by .NET; e.g.:

      $lines = ...  # array of strings (to become lines in a file)
      # CAVEAT: Be sure to specify an *absolute file path* in $file,
      #         because .NET typically has a different working dir.
      [IO.File]::WriteAllLines($file, $lines, [Text.Encoding]::GetEncoding(1252))
    
  • Use PowerShell Core, which allows you to pass any supported .NET encoding to the
    -Encoding parameter:

      ... | Out-File -Encoding ([Text.Encoding]::GetEncoding(1252)) $file
    

Note that in PSv5.1+ you can actually change the encoding used by the > and >> operators, as detailed in this answer.
However, in Windows PowerShell you are again limited to the encodings supported by Out-File's -Encoding parameter.


Creating text files with LF (Unix-format) newlines on Windows:

PowerShell (invariably) and .NET (by default) use the platform-native newline sequence - as reflected in [Environment]::NewLine - when writing strings as lines to a file.

  • In other words: on Windows you'll end up with files with CRLF newlines, and on Unix-like platforms with LF newlines.

Note that the solutions below assume that the data to write to your file is an array of strings that represent the lines to write, as returned by Get-Content, for instance (where the resulting array elements are the input file's lines without their trailing newline sequence).

To explicitly create a file with LF newlines on Windows (PSv5+):

$lines = ...  # array of strings (to become lines in a file)

($lines -join "`n") + "`n" | Set-Content -NoNewline $file

"`n" produces a LF character.

Note:

  • In Windows PowerShell, Set-Content implicitly uses the active ANSI code page's encoding.

  • In PowerShell (Core, v6+), Set-Content implicitly creates a UTF-8 file without BOM. If you want to use the active ANSI code page instead, use:

    -Encoding ([Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)))
    

In PSv4- (PowerShell version 4 or lower), you'll have to use the .NET Framework directly:

$lines = ...  # array of strings (to become lines in a file)


# CAVEAT: Be sure to specify an *absolute file path* in $file,
#         because .NET typically has a different working dir.
[IO.File]::WriteAllText($file, ($lines -join "`n") + "`n")

Note:

  • In both Windows PowerShell and PowerShell (Core, v6+) this creates a UTF-8 file without BOM.

  • If you want to use the active ANSI code page instead, pass the following as an additional argument to WriteAllText():

    ([Text.Encoding]::GetEncoding([int] (Get-ItemPropertyValue HKLM:\SYSTEM\CurrentControlSet\Control\Nls\CodePage ACP)))
    
mklement0
  • 382,024
  • 64
  • 607
  • 775