As Mike Anthony's helpful answer explains, there is no system-level function that efficiently implements what you're trying to do, so you have no choice but to rewrite your file.
While memory-intensive, the following solution is reasonably fast:
Read the file as a whole into memory, as a single string, using Get-Content
's -Raw
switch...
- This is orders of magnitude faster than the line-by-line streaming that
Get-Content
performs by default.
... then use regex processing to strip the first 10 lines ...
... and save the trimmed content back to disk.
Important:
Since this rewrites the file in place, be sure to have a backup copy of your file.
Use -Encoding
with Get-Content
/ Set-Content
to correctly interpret the input / control the output character encoding (PowerShell fundamentally doesn't preserve the information about the character encoding of a file that was read with Get-Content
). Without -Encoding
, the default encoding is the system's active ANSI code page in Windows PowerShell, and, more sensibly, BOM-less UTF-8 in PowerShell (Core) 7+.
# Use -Encoding as needed.
(Get-Content -Raw in.csv) -replace '^(?:.*\r?\n){10}' |
Set-Content -NoNewLine in.csv
If the file is too large to fit into memory:
If you happen to have WSL installed, an efficient, streaming tail
solution is possible:
Note:
Your input file must use a character encoding in which a LF character is represented as a single 0xA
byte - which is true of most single-byte encodings and also of the variable-width UTF-8 encoding, but not of, say, UTF-16.
You must output to a different file (which you can later replace the input file with).
bash.exe -c 'tail +11 in.csv > out.csv'
Otherwise, line-by-line processing is required.
Note: I'm leaving aside other viable approaches, namely those that either read and write the file in large blocks, as zett42 recommends, or an approach that collects (large) groups of output lines before writing them to the output file in a single operation, as shown in Theo's helpful answer.
Caveat:
All line-by-line processing approaches risk inadvertently changing the newline format of the original file: on writing the lines back to a file, it is invariably the platform-native newline format that is used (CLRF on Windows, LF on Unix-like platforms).
Also, the information as to whether the input file had a trailing newline or not is lost.
Santiago's helpful answer shows a solution based on .NET APIs, which performs well by PowerShell standards.
Brice came up with an elegant and significant optimization that lets a .NET method perform the (lazy) iteration over the file's lines, which is much faster than looping in PowerShell code:
[System.IO.File]::WriteAllLines(
"$pwd/out.csv",
[Linq.Enumerable]::Skip(
[System.IO.File]::ReadLines("$pwd/in.csv"),
10
)
)
For the sake of completeness, here's a comparatively slower, PowerShell-native solution using a switch
statement with the -File
parameter for fast line-by-line reading (much faster than Get-Content
):
& {
$i = 0
switch -File in.csv {
default { if (++$i -ge 11) { $_ } }
}
} | Set-Content out.csv # use -Encoding as needed
Note:
Since switch
doesn't allow specifying a character encoding for the input file, this approach only works if the character encoding is correctly detected / assumed by default. While BOM-based files will be read correctly, note that switch
makes different assumptions about BOM-less files based on the PowerShell edition: in Windows PowerShell, the system's active ANSI code page is assumed; in PowerShell (Core) 7+, it is UTF-8.
Because language statements cannot directly serve as pipeline input, the switch
statement must be called via a script block (& { ... }
)
Streaming the resulting lines to Set-Content
via the pipeline is what slows the solution down. Passing the new file content as an argument, to Set-Content
's -Value
parameter would drastically speed up the operation - but that would again require that the file fit into memory as a whole:
# Faster reformulation, but *input file must fit into memory as whole*.
# `switch` offers a lot of flexibility. If that isn't needed
# and reading the file in full is acceptable, the
# the Get-Content -Raw solution at the top is the fastest Powershell solution.
Set-Content out.csv $(
$i = 0
switch -File in.csv {
default { if (++$i -ge 11) { $_ } }
}
)