180

I want to read a file line by line in PowerShell. Specifically, I want to loop through the file, store each line in a variable in the loop, and do some processing on the line.

I know the Bash equivalent:

while read line do
    if [[ $line =~ $regex ]]; then
          # work here
    fi
done < file.txt

Not much documentation on PowerShell loops.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Kingamere
  • 9,496
  • 23
  • 71
  • 110
  • 2
    The selected answer from Mathias is not a great solution. `Get-Content` loads the entire file into memory at once, which will fail or freeze on large files. – Kellen Stuart Jun 07 '19 at 18:23
  • 1
    @KolobCanyon that is completely untrue. By default Get-Content loads each line as one object in the pipeline. If you're piping to a function that doesn't specify a `process` block, and spits out another object per line into the pipeline, then that function is the problem. Any problems with loading the full content into memory are not the fault of `Get-Content`. – The Fish Jul 04 '19 at 12:39
  • 1
    @TheFish `foreach($line in Get-Content .\file.txt)` It will load the entire file into memory before it begins iterating. If you don't believe me, go get a 1GB log file and try it. – Kellen Stuart Jul 05 '19 at 15:00
  • 4
    @KolobCanyon That's not what you said. You said that Get-Content loads it all into memory which is not true. Your changed example of foreach would, yes; foreach is not pipeline aware. `Get-Content .\file.txt | ForEach-Object -Process {}` is pipeline aware, and will not load the entire file into memory. By default Get-Content will pass one line at a time through the pipeline. – The Fish Jul 08 '19 at 10:46

5 Answers5

294

Not much documentation on PowerShell loops.

Documentation on loops in PowerShell is plentiful, and you might want to check out the following help topics: about_For, about_ForEach, about_Do, about_While.

foreach($line in Get-Content .\file.txt) {
    if($line -match $regex){
        # Work here
    }
}

Another idiomatic PowerShell solution to your problem is to pipe the lines of the text file to the ForEach-Object cmdlet:

Get-Content .\file.txt | ForEach-Object {
    if($_ -match $regex){
        # Work here
    }
}

Instead of regex matching inside the loop, you could pipe the lines through Where-Object to filter just those you're interested in:

Get-Content .\file.txt | Where-Object {$_ -match $regex} | ForEach-Object {
    # Work here
}
Dan Atkinson
  • 11,391
  • 14
  • 81
  • 114
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • 3
    the last one is the most idiomatic for powershell, and can be even more succinctly written with `gc 'file.txt' | ?{ $_ -match $regex } | %{ <#stuff#> }` – Hashbrown Jan 24 '21 at 02:41
  • 7
    Yes but, "succinct' and 'lucid' are two different things. If you need anyone to read this script ever then I beg you - don't do this to us. – Tom Padilla Oct 26 '22 at 14:34
83

Get-Content has bad performance; it tries to read the file into memory all at once.

C# (.NET) file reader reads each line one by one

Best Performace

foreach($line in [System.IO.File]::ReadLines("C:\path\to\file.txt"))
{
       $line
}

Or slightly less performant

[System.IO.File]::ReadLines("C:\path\to\file.txt") | ForEach-Object {
       $_
}

The foreach statement will likely be slightly faster than ForEach-Object (see comments below for more information).

Kellen Stuart
  • 7,775
  • 7
  • 59
  • 82
  • 6
    I would probably use `[System.IO.File]::ReadLines("C:\path\to\file.txt") | ForEach-Object { ... }`. The `foreach` statement will [load the entire collection to an object](https://blogs.technet.microsoft.com/heyscriptingguy/2014/07/08/getting-to-know-foreach-and-foreach-object/). `ForEach-Object` uses a pipeline to stream with. Now the `foreach` statement will likely be slightly faster than the `ForEach-Object` command, but that's because loading the whole thing to memory usually is faster. `Get-Content` is still terrible, however. – Bacon Bits Nov 07 '17 at 00:11
  • @BaconBits `foreach()` is an alias of `Foreach-Object` – Kellen Stuart Nov 07 '17 at 00:14
  • 24
    That is a very common misconception. `foreach` is a statement, like `if`, `for`, or `while`. `ForEach-Object` is a command, like `Get-ChildItem`. There is also a default alias of `foreach` for `ForEach-Object`, but it is only used when there is a pipeline. See the long explanation in `Get-Help about_Foreach`, or click the link in my previous comment which goes to an entire article by Microsoft's The Scripting Guys about the differences between the statement and the command. – Bacon Bits Nov 07 '17 at 13:10
  • 4
    @BaconBits https://blogs.technet.microsoft.com/heyscriptingguy/2014/07/08/getting-to-know-foreach-and-foreach-object/ Learned something new. Thanks. I assumed they were the same because `Get-Alias foreach` => `Foreach-Object`, but you are right, there are differences – Kellen Stuart Nov 07 '17 at 15:07
  • @BaconBits I added your suggestion to the answer – Kellen Stuart May 15 '18 at 17:10
  • 2
    That will work, but you'll want to change `$line` to `$_` in the loop's script block. – Bacon Bits May 15 '18 at 18:10
  • Kolob Canyon, I upvoted your answer because of the techniques it suggests. However, if what @BaconBits said is true, about foreach() loading the entire collection to an object--and your subsequent comments indicate that you agree--then it logically follows that the first snippet suffers from precisely the same problem that you are suggesting it as a remedy for. i.e., "...has bad performance [because it reads] the file into memory all at once." Your final statement attempts to clear this up, but I suggest editing the intro to your answer to make it clear up front. – Richard II May 29 '18 at 15:11
  • Not working on the Windows 7: `Method invocation failed because [System.IO.File] doesn't contain a method named 'ReadLines'.` [This answer works fine](https://stackoverflow.com/a/4192419/3632516). – anilech Nov 05 '18 at 09:23
  • @anilech You are probably on an old version of Powershell. If you're are on `2` or below, you should upgrade – Kellen Stuart Nov 05 '18 at 22:05
  • @KolobCanyon performance was never mentioned as an issue on the OP. – The Fish Jul 04 '19 at 12:40
  • 1
    @TheFish true, but this being a canonical question, I think people should know that using `Get-Content` is the devil. – Kellen Stuart Jul 05 '19 at 14:56
  • @BaconBits With Get-Content, I can join the strings like this `(Get-Content .\BTSManifest.txt | Join-String -Separator ',')` . How to perform such a join using these two methods? – FMFF Apr 14 '23 at 16:55
14

Reading Large Files Line by Line

Original Comment (1/2021) I was able to read a 4GB log file in about 50 seconds with the following. You may be able to make it faster by loading it as a C# assembly dynamically using PowerShell.

[System.IO.StreamReader]$sr = [System.IO.File]::Open($file, [System.IO.FileMode]::Open)
while (-not $sr.EndOfStream){
    $line = $sr.ReadLine()
}
$sr.Close() 

Addendum (3/2022) Processing the large file using C# embedded in PowerShell is even faster and has less "gotchas".

$code = @"
using System;
using System.IO;

namespace ProcessLargeFile
{
    public class Program
    {
        static void ProcessLine(string line)
        {
            return;
        }

        public static void ProcessLogFile(string path) {
            var start_time = DateTime.Now;
            StreamReader sr = new StreamReader(File.Open(path, FileMode.Open));
            try {
                while (!sr.EndOfStream){
                    string line = sr.ReadLine();
                    ProcessLine(line);
                }
            } finally {
                sr.Close();
            }
            var end_time = DateTime.Now;
            var run_time = end_time - start_time;
            string msg = "Completed in " + run_time.Minutes + ":" + run_time.Seconds + "." + run_time.Milliseconds;
            Console.WriteLine(msg);
        }

        static void Main(string[] args)
        {
            ProcessLogFile("c:\\users\\tasaif\\fake.log");
            Console.ReadLine();
        }
    }
}
"@
 
Add-Type -TypeDefinition $code -Language CSharp

PS C:\Users\tasaif> [ProcessLargeFile.Program]::ProcessLogFile("c:\\users\\tasaif\\fake.log")
Completed in 0:17.109
Tareq Saif
  • 439
  • 4
  • 5
  • Tareq Saif -- 4 GB in 50 secs has not been true for me with this example. Am I missing something? – ToC Mar 03 '22 at 03:16
  • @ToC I tried it again today and I believe I filtered my dataset first before performing any function calls. For example if ($line.Contains("relevant information")){ Do something useful } If you try running a function on every line (including an empty function) it takes much longer. If you must run a function for each line and want it to run faster I would look into parallelizing the code maybe using threads. – Tareq Saif Mar 07 '22 at 00:21
  • Apparently, I can't go back to modify my comment. I tried embedding C# in the PowerShell and it doesn't suffer from that limitation. With an empty function and just reading the lines, it processed in 18 seconds. I'll add the code to my comment above. – Tareq Saif Mar 07 '22 at 00:47
  • -Thank you, I'll try this and see how it plays out. Appreciate you taking time to add more details !! – ToC Mar 08 '22 at 01:45
10

The almighty switch works well here:

'one
two
three' > file

$regex = '^t'

switch -regex -file file { 
  $regex { "line is $_" } 
}

Output:

line is two
line is three
mklement0
  • 382,024
  • 64
  • 607
  • 775
js2010
  • 23,033
  • 6
  • 64
  • 66
3

Set-Location 'C:\files'
$files = Get-ChildItem -Name -Include *.txt
foreach($file in $files){
        Write-Host("Start Reading file: " + $file)
        foreach($line in Get-Content $file){
            Write-Host($line)
        }
        Write-Host("End Reading file: " + $file)                
}

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 21 '22 at 00:07