1

I have a directory with ~ 3000 text files in it, and I'm doing periodic search and replaces on those text files as I transition a program to a new server.

Each text file may have an average of ~3000 lines, and I need to search the files for maybe 300 - 1000 terms at a time.

I'm replacing the server prefix which is related to the string I'm searching for. So for every one of the csv entries, I'm looking for Search_String, \\Old_Server\"Search_String" and making sure that after the program completes, the result is "\\New_Server\Search_String".

I cobbled together a powershell program, and it works. But it's so slow I've never seen it complete.

Any suggestions for making it faster?

EDIT 1: I changed get-content as suggested, but it still took 3 minutes to search two files (~8000 lines) for 9 separate search terms. I must still be screwing up; a notepad++ search and replace would still be way faster if done manually 9 times.

I'm not sure how to get rid of the first (Get-Content) because I want to make a copy of the file for backup before I make any changes to it.

EDIT 2: So this is an order of magnitude faster; it's searching a file in maybe 10 seconds. But now it doesn't write changes to files, and it only searches the first file in the directory! I didn't change that code, so I don't know why it broke.

EDIT 3: Success! I adapted a solution posted below to make it much, much faster. It's searching each file in a couple of seconds now. I may reverse the loop order, so that it loads the file into the array and then searches and replaces each entry in the CSV rather than the other way around. I'll post that if I get it to work.

Final script is below for reference.

#get input from the user
$old = Read-Host 'Enter the old cimplicity qualifier (F24, IRF3 etc'
$new = Read-Host 'Enter the new cimplicity qualifier (CB3, F24_2 etc)'
$DirName = Get-Date -format "yyyy_MM_dd_hh_mm"

New-Item -ItemType directory -Path $DirName -force
New-Item "$DirName\log.txt" -ItemType file -force -Value "`nMatched CTX files on $dirname`n"
$logfile = "$DirName\log.txt"

$VerbosePreference = "SilentlyContinue"


$points = import-csv SearchAndReplace.csv -header find #Import CSV File
#$ctxfiles = Get-ChildItem . -include *.ctx | select -expand fullname #Import local directory of CTX Files

$points | foreach-object { #For each row of points in the CSV file
    $findvar = $_.find #Store column 1 as string to search for  

    $OldQualifiedPoint = "\\\\"+$old+"\\" + $findvar #Use escape slashes to escape each invidual bs so it's not read as regex
    $NewQualifiedPoint = "\\"+$new+"\" + $findvar #escape slashes are NOT required on the new string
    $DuplicateNew = "\\\\" + $new + "\\" + "\\\\" + $new + "\\"
    $QualifiedNew = "\\" + $new + "\"

    dir . *.ctx | #Grab all CTX Files 
     select -expand fullname | #grab all of those file names and...
      foreach {#iterate through each file
                $DateTime = Get-Date -Format "hh:mm:ss"
                $FileName = $_
                Write-Host "$DateTime - $FindVar - Checking $FileName"
                $FileCopied = 0
                #Check file contents, and copy matching files to newly created directory
                If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
                   If (!($FileCopied)) {
                        Copy $FileName -Destination $DirName
                        $FileCopied = 1
                        Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
                        Write-Host "$DateTime - Found $Findvar in $filename"
                    }

                    $FileContent = Get-Content $Filename -ReadCount 0
                    $FileContent =
                    $FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
                    $FileContent | Set-Content $FileName
                }
           }
         $File.Dispose()
    }       
Justin
  • 113
  • 2
  • 7
  • You're still using get-content in the conditional check, so it's still going to take a long time. It will be much faster just to do the replace and then check if you changed anything and output that as your "XX found" – JNK Mar 14 '14 at 17:52

3 Answers3

2

If I'm reading this correctly, you should be able to read a 3000 line file into memory, and do those replaces as an array operation, eliminating the need to iterate through each line. You can also chain those replace operations into a single command.

dir . *.ctx | #Grab all CTX Files 
     select -expand fullname | #grab all of those file names and...
      foreach {#iterate through each file
                $DateTime = Get-Date -Format "hh:mm:ss"
                $FileName = $_
                Write-Host "$DateTime - $FindVar - Checking $FileName"
                #Check file contents, and copy matching files to newly created directory
                If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
                    Copy $FileName -Destination $DirName
                    Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
                    Write-Host "$DateTime - Found $Findvar in $filename"

                    $FileContent = Get-Content $Filename -ReadCount 0
                    $FileContent =
                      $FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
                     $FileContent | Set-Content $FileName
                }
           }

On another note, Select-String will take the filepath as an argument, so you don't have to do a Get-Content and then pipe that to Select-String.

mjolinor
  • 66,130
  • 7
  • 114
  • 135
  • I see what you meant, wasn't aware of the readcount parameter and `-r 0` didn't mean much to me. Makes a world of difference though, cool cool. – Cole9350 Mar 14 '14 at 19:11
  • Awesome! That worked perfectly, and it was fast. Since I'm loading the entire file into an array; I think at this stage it would be quicker to change the loop order. Currently, I'm pulling a csv entry, and then searching through all the files. It'd probably be faster to open a file, and then search through all the CSV entries. Thanks! – Justin Mar 14 '14 at 19:16
  • In that case, I'd get rid of the select-string test altogether, and just run every file through the CSV collection. It will probably be faster than going back and running another select-string on every iteration through the CSV loop. – mjolinor Mar 14 '14 at 19:19
1

Yes, you can make it much faster by not using Get-Content... Use Stream Reader instead.

$file = New-Object System.IO.StreamReader -Arg "test.txt"
while (($line = $file.ReadLine()) -ne $null) {
    # $line has your line
}
$file.dispose()
Cole9350
  • 5,444
  • 2
  • 34
  • 50
  • `while($line = $file.ReadLine())` will stop on the first empty line. Comparison with $null is better. – Roman Kuzmin Mar 14 '14 at 17:10
  • Yeah; most of the files have null lines scattered through them :P – Justin Mar 14 '14 at 17:33
  • 2
    Using .readline() is still doing a line at a time. These are only 3000 line files, so you should be able to read the whole file into memory and then to an array replace e.g.(Get-Content $FileName -r 0) -replace – mjolinor Mar 14 '14 at 18:06
  • @mjolinor Idk what that has to do with my post... maybe consider commenting on the original question, or even posting your answer. – Cole9350 Mar 14 '14 at 18:41
  • I would think that each file could be loaded into memory pretty quickly; none of them are over a couple megs, but like I said - it was unuseable slow. I don't know much of anything about powershell, but I was advised to read them line by line. – Justin Mar 14 '14 at 18:53
0

i wanted to use PowerShell for this and created a script like the one below:

$filepath = "input.csv"
$newfilepath = "input_fixed.csv"

filter num2x { $_ -replace "aaa","bbb" }
measure-command {
    Get-Content -ReadCount 1000 $filepath | num2x | add-content $newfilepath
}    

It took 19 minutes on my laptop to process 6.5Gb file. The code below is reading file in a batch (using ReadCount) and uses filter that should optimize performance.

But then I tried FART and it did the same thing in 3 minutes! quite a difference!

mishkin
  • 5,932
  • 8
  • 45
  • 64