tl;dr
-Encoding ASCII
does work, though your editor's GUI may still report the resulting file as UTF-8-encoded, for the reasons explained below.
First, a general caveat:
- If your input file also contains non-ASCII-range characters, they will be transliterated to verbatim
?
, i.e. you'll potentially lose information.
- Conversely, if your input files are UTF-8-encoded but do not contain non-ASCII characters, they in effect already are ASCII-encoded files; see below.
ASCII encoding is a subset of UTF-8 encoding (except that ASCII encoding never involves a BOM).
- Therefore, any (BOM-less) file composed exclusively of bytes representing ASCII characters is by definition also a valid UTF-8 file.
Modern editors default to BOM-less UTF-8; that is, if a file doesn't start with a BOM, they assume that it is UTF-8-encoded, and that's what their GUIs reflect - even if a given file happens to be composed of ASCII characters only.
To verify that your output file is indeed only composed of ASCII characters, use the following:
# This should return $false; '\P{IsBasicLatin}' matches any NON-ASCII character.
(Get-Content -Raw File/Path/to/processed.txt) -cmatch '\P{IsBasicLatin}'
For an explanation of this test, especially with respect to needing to use -cmatch
, the case-sensitive variant of the -match
operator, see this answer.
A complete example:
# Write a string that contains non-ASCII characters to a
# file with -Encoding Ascii
# The resulting fill will contain 1 line, with content 'caf?'
# That is, the "é" character was "lossily" transliterated to (ASCII) "?"
'café' | Out-File -Encoding Ascii temp.txt
# Examining the file for non-ASCII characters now indicates that
# there are none, i.e, $false is returned.
(Get-Content -Raw temp.txt) -cmatch '\P{IsBasicLatin}'