0

I am using Powershell 7.

We have the following PowerShell script that will parse some very large file.

I no longer want to use 'Get-Content' as this is to slow.

The script below works, but it takes a very long time to process even a 10 MB file.

I have about 200 files 10MB file with over 10000 lines.

Sample Log:

#Fields:1
#Fields:2
#Fields:3
#Fields:4
#Fields: date-time,connector-id,session-id,sequence-number,local-endpoint,remote-endpoint,event,data,context
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.22:15650,<,DATA,
2023-01-31T13:53:50.404Z,EXCH1\Relay-EXCH1,08DAD23366676FF1,41,10.10.10.2:25,195.85.212.25:15650,<,DATA,

Script:

$Output = @()
$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-Item  $LogFilePath
$Count = @($logfiles).count

ForEach ($Log in $LogFiles)
{
    $Int = $Int + 1
    
    $Percent = $Int/$Count * 100

    Write-Progress -Activity "Collecting Log details" -Status "Processing log File $Int of $Count - $LogFile" -PercentComplete $Percent 

    Write-Host "Processing Log File  $Log" -ForegroundColor Magenta
    Write-Host
    $FileContent = Get-Content $Log | Select-Object -Skip 5
    ForEach ($Line IN $FileContent)
    {

        $Socket = $Line  | Foreach {$_.split(",")[5] }

        $IP = $Socket.Split(":")[0]

        $Output += $IP

    } 
} 
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object

Write-Host "List of noted remove IPs:" 
$Output
Write-Host 
$Output | Out-File $PWD\Output.txt 
Arbelac
  • 1,698
  • 6
  • 37
  • 90
  • Try to [avoid using the increase assignment operator (+=) to create a collection](https://stackoverflow.com/a/60708579/1701026) – iRon Feb 01 '23 at 18:22
  • Use a [hashset](https://learn.microsoft.com/dotnet/api/system.collections.generic.hashset-1?view=net-7.0) instead of: `Select-Object -Unique | Sort-Object` – iRon Feb 01 '23 at 18:31
  • Use `Get-Content -Raw ...` – iRon Feb 01 '23 at 18:33
  • You have a nested loop with a complexity of O(n^2) which is always slow and in addition to that you have another loop while parsing your sockets. Pipes are quite slow. Select-Object can be removed and done manually and you can switch from ForEach to a for loop. You can also use .Net objects or even invoke a C# code from within Powershell. All that is quicker than PowerShell with ForEach and Pipes. –  Feb 01 '23 at 18:38
  • Check you memory usage in Task Manager. You may be using all your memory. What type of file system do you have? A slow disk may make code run slow. You code should run quickly. I would add code that works with file in a using block to make sure the file is disposed properly. – jdweng Feb 01 '23 at 18:43

3 Answers3

1

As @iRon Suggests the assignment operator (+=) is a lot of overhead. As well as reading entire file to a variable then processing it. Perhaps process it strictly as a pipeline. I achieved same results, using your sample data, with the code written this way below.

$LogFilePath = "C:\LOGS\*.log"
$LogFiles = Get-ChildItem $LogFilePath
$Count = @($logfiles).count

$Output = ForEach($Log in $Logfiles) {
    # Code for Write-Progress here
    Get-Content -Path $Log.FullName | Select-Object -Skip 5 | ForEach-Object {
        $Socket = $_.split(",")[5] 
        $IP = $Socket.Split(":")[0]
        $IP
    }
}
$Output = $Output | Select-Object -Unique
$Output = $Output | Sort-Object

Write-Host "List of noted remove IPs:" 
$Output
Grok42
  • 196
  • 4
0

Apart from the notable points in the comments, I believe this question is more suitable to Code Review. Nonetheless, here's my take on this using the StreamReader class:

$LogFilePath = "C:\LOGS\*.log"
$LogFiles    = Get-Item -Path $LogFilePath
$OutPut      = [System.Collections.ArrayList]::new()

foreach ($log in $LogFiles)
{
    $skip = 0
    $stop = $false
    $stream = [System.IO.StreamReader]::new($log.FullName)
    while ($line = $stream.ReadLine())
    {
        if (-not$stop)
        {
            if ($skip++ -eq 5)
            {
                $stop = $true
            }
            continue
        }
        elseif ($OutPut.Contains(($IP = ($line -split ',|:')[6])))
        {
            continue
        }
        $null = $OutPut.Add($IP)
    }
    $stream.Close()
    $stream.Dispose()
}
# Display OutPut and save to file
Write-Host -Object "List of noted remove IPs:" 
$OutPut | Sort-Object | Tee-Object -FilePath "$PWD\Output.txt"

This way you can output unique IP's since it's being handled by an if statement checking against what's in $OutPut; essentially replacing Select-Object -Unique. You should see a speed increase as you're no longer adding to a fixed array (+=), and piping to other cmdlets.

Abraham Zinala
  • 4,267
  • 3
  • 9
  • 24
  • First of all thanks for your answer. But I can't get correctly an output via your script. I am using above my sample log file. but Just I am getting an output like below. I am assuming , this might be issue about parsing. – Arbelac Feb 04 '23 at 19:02
  • Output : 195.85.212.25 – Arbelac Feb 04 '23 at 19:03
  • desired output : 195.85.212.25 195.85.212.22 – Arbelac Feb 04 '23 at 19:04
  • So, it's only displaying one IP instead of 2? If so, is there spacing between lines? – Abraham Zinala Feb 04 '23 at 20:31
  • No there is no any space between lines. – Arbelac Feb 06 '23 at 18:27
  • My sample log is https://paste.ee/p/EhINe – Arbelac Feb 06 '23 at 18:36
  • @Arbelac, I see. Use the index of `[6]` instead of `[-5]` then, or wherever your IP falls. I edited the code to include the `6`th index instead. – Abraham Zinala Feb 06 '23 at 23:33
  • Still same issue. According my sample log my desired output : 192.168.100.15 192.168.100.16 but , I am trying your script. I am getting an output like that. 192.168.100.11 I am assuming, your script are reading wrong column inside CSV file. – Arbelac Feb 07 '23 at 08:17
  • @Arbelac, correct, hence the "*or wherever your IP falls*". It may be the 8th index if that's the case; referencing your .bin. – Abraham Zinala Feb 07 '23 at 15:30
0

You can combine File.ReadLines with Enumerable.Skip to read your files and skip their first 5 lines. This method is much faster than Get-Content. Then for sorting and getting unique strings at the same time you can use a SortedSet<T>.

You should avoid using Write-Progress as this will slow your script down in Windows PowerShell (this has been fixed in newer versions of PowerShell Core).

Do note that because you're looking to sort the result, all strings must be contained in memory before outputting to a file. This would be much more efficient if sorting was not needed, there you would use a HashSet<T> instead for getting unique values.

Get-Item C:\LOGS\*.log | & {
    begin { $set = [Collections.Generic.SortedSet[string]]::new() }
    process {
        foreach($line in [Linq.Enumerable]::Skip([IO.File]::ReadLines($_.FullName), 5)) {
            $null = $set.Add($line.Split(',')[5].Split(':')[0])
        }
    }
    end {
        $set
    }
} | Set-Content $PWD\Output.txt 
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • Hi , I have an issue related to the `export-csv in invoke-command ` https://stackoverflow.com/questions/76515330/invoke-command-and-outputting-to-a-csv-file – Arbelac Jun 20 '23 at 14:43