Indeed, Get-Content
by defaults reads and emits the input file's content line by line, with newlines of any flavor - CRLF, LF, CR - stripped.
While the behavior may be unfamiliar, is generally helpful for processing files in the pipeline.
As your answer shows, -Raw
can be used to read a file in full, as a single, multi-line string instead - which can offer great performance benefits.
To give an example of the convenience that line-by-line reading can provide, combined with the regex-based -replace
operator's ability to operate on each element of an input array (if your file has LF (\n
) endings and you're selectively looking for rogue CRLF (\r\n
) line endings preceded by ,
, that won't help, however):
# Convenient, but can be made faster with -ReadCount 0 - see below.
@(Get-Content file.txt) -replace ',$' | Set-Content file2.txt
Note: @(...)
, the array-subexpression operator, is used to ensure that the Get-Content
call also outputs an array even if the file happens to have just one line.
Regex anchor $
matches the end of each input string (line), in effect removing a trailing ,
from each line, where present.
Get-Content
performance notes:
As hinted at above, -Raw
is by far the fastest way to read a text file in full - but by design as a single, multiline string.
The default behavior, line-by-line reading is slow, not least because PowerShell decorates each output line with metadata[1] (in the case of -Raw
, given that there's only one output string, that happens only once).
However, you can speed things up by reading lines in batches - arrays of lines of a given size - using the -ReadCount
parameter, in which case only each array, not the individual lines, are decorated. -ReadCount 0
reads all lines, into a single array.
Note:
-ReadCount
changes the streaming behavior in the pipeline: Each array is then sent as a whole through the pipeline, which the receiving command needs to be plan for, typically by performing its own enumeration of the array received, such as with a foreach
loop.
By contrast, using -ReadCount 0
in the context of an expression results in no behavioral difference, which means that it can be used as a simple performance optimization that requires no other parts of the expression to accommodate it; using an expression with a -replace
operation as an example:
# Read all lines directly into an array, with -ReadCount 0,
# instead of more slowly letting PowerShell stream the lines
# (emit them one by one) and then collect them in an array for you.
# The -replace operator then acts on each element of the array.
(Get-Content -ReadCount 0 file.txt) -replace ',$'
Note: @(...)
is not necessary in this case, because -ReadCount 0
always emits an array, even for single-line files.
A better-performing line-by-line-processing alternative - although it cannot directly be used as part of an expression - is to use the -switch
statement with the -File
parameter - see this answer for details.
[1] This metadata is provided in the form of ETS (Extended Type System) properties, which notably provide information about the line number and the path of the originating file. Pipe a Get-Content
call to | Format-List -Force
to see these properties. While this extra information can be helpful, the performance impact of attaching it is noticeable. Given that the information is often not needed, having a least an opt-out would be helpful: see GitHub issue #7537.