2

I have a Powershell script which searches through folders recursively to look for a user supplied string to search for and replace. In addition, I want to be able to save the file path of only the files that had the match. Currently my script is saving all the files it looks through (all the .config), not just the matches. Here is my code that works for saving all. How do I only write the file paths that had a match to my output file?

$rootDir = GCI $dirPath -recurse -include *.config
    ForEach($dir in $rootDir)
    {
        (Get-Content $dir).replace($oldString,$newString) | Set-Content $dir
      
        $dir.FullName >> $filePathName
     } 
taraloca
  • 9,077
  • 9
  • 44
  • 77

2 Answers2

1

Seems like you're missing a check before the replacement: only output the file's absolute path if the content matches $oldString.

Also your code could benefit from pipeline processing and the use of -Raw for efficiency:

Get-ChildItem $dirPath -Recurse -Include *.config | ForEach-Object {
    # `-Raw` reads the content as a single multi-line string
    $content = $_ | Get-Content -Raw
    # if there is a match
    if ($content -like "*$oldString*") {
        # replace and store the updated version of the file
        $content.Replace($oldString, $newString) | Set-Content $_.FullName -NoNewLine
        # and output the fullpath
        $_.FullName
    }
} | Set-Content $filePathName
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    good call, thanks @mklement0 – Santiago Squarzon Aug 07 '23 at 15:11
  • 2
    Thank you both! With or without the -NoNewLine I still get each path on a separate line, which is what I want! So are you saying that more lines could be added without this? I am fairly new to PowerShell and appreciate you knowledge for efficiency. – taraloca Aug 07 '23 at 15:16
  • 1
    @taraloca glad to help. he mentioned `-NoNewLine` mainly because otherwise those files we're replacing will end up with 2 empty lines at the end instead of only 1 – Santiago Squarzon Aug 07 '23 at 15:23
0

To complement Santiago's helpful answer, which offers an effective solution that additionally speeds up the operation with the use of Get-Content's -Raw switch, which reads a given text file in full, as a single, (typically) multiline string.

An additional optimization is possible:

  • .NET's .Replace() string-replacement method returns the input instance as-is if no actual replacement was performed (the same applies analogously to PowerShell's regex-based -replace operator).

  • Thus, if the input string and the result string are the exact same [string] instance, the implication is that NO substitutions were performed, from which you can infer that the search string wasn't present.

Therefore, using this technique saves you from having to search the file contents twice (once to look for the string, and again during replacement), because you can infer from the result of the .Replace() call whether the search string was present in the file at hand:

Get-ChildItem -LiteralPath $dirPath -Recurse -Filter *.config | ForEach-Object {
    $content = $_ | Get-Content -Raw
    # Perform the desired replacement.
    $modifiedContent = $content.Replace($oldString, $newString)
    # Were actual replacements made? If not, the implication is that
    # $oldString wasn't present in the file at hand.
    $actuallyModified = -not [object]::ReferenceEquals($content, $modifiedContent)
    if ($actuallyModified) {         
      $modifiedContent | Set-Content -NoNewLine -LiteralPath $_.FullName # Save modified content.
      $_.FullName # Output the path of the modified file to the pipeline.
    }
} | Set-Content -LiteralPath $filePathName

Note the use of -LiteralPath to ensure that all paths are treated literally (verbatim); by default, with the (positionally impied) -Path parameter, paths are interpreted as wildcard expressions, which notably causes problems with literal paths that contain [ and ]. Due to a bug in Windows PowerShell (since fixed in PowerShell (Core) 7+), -LiteralPath cannot be combined with -Include - see this answer - which is why -Filter is used instead (which is actually a blessing, because it is much faster than -Include).

Character-encoding caveat:

  • Get-Content reads text files into .NET strings, without storing information about the file's original character encoding.

  • PowerShell's file-writing cmdlets such as Set-Content and Out-File / > operate on .NET strings and by default use their default character encoding (which is fixed an unrelated to any input):

    • In Windows PowerShell, it is ANSI for Set-Content and "Unicode" (UTF-16LE) for Out-File / > (other cmdlets may have different defaults - see the bottom section of this answer).

    • More sensibly, PowerShell (Core) 7+ now uses the same default character encoding for all cmdlets, which is BOM-less UTF-8 (in Windows PowerShell you can use -Encoding utf8, but you always get a BOM - see this answer for a workaround).

  • Upshot:

    • You may have to use the -Encoding parameter to get the desired output character encoding.

    • If you need to match an input file's encoding, you must know what it is (so you can pass its name to -Encoding).

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    I see, @taraloca: you've run into a _bug_ in _Windows PowerShell_ (since fixed in PowerShell (Core) 7+, where combining `-LiteralPath` with `-Include` causes the latter to be ignored. Please see my update, which restores use of `-LiteralPath` and combines it with `-Filter` instead, which is actually better, because it is faster. Yes, if you let the user pick an arbitrary, specific folder - which by definition you want to use _literally_ - and its path happens to contain `[` and `]`, you would run into trouble if you didn't use `-LiteralPath`, because `-Path` would _interpret it_ as a wildcard. – mklement0 Aug 07 '23 at 19:35