52

I am trying to remove all the lines from a text file that contains a partial string using the below PowerShell code:

 Get-Content C:\new\temp_*.txt | Select-String -pattern "H|159" -notmatch | Out-File C:\new\newfile.txt

The actual string is H|159|28-05-2005|508|xxx, it repeats in the file multiple times, and I am trying to match only the first part as specified above. Is that correct? Currently I am getting empty as output.

Am I missing something?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user3759904
  • 875
  • 2
  • 8
  • 7

7 Answers7

58

Suppose you want to write that in the same file, you can do as follows:

Set-Content -Path "C:\temp\Newtext.txt" -Value (get-content -Path "c:\Temp\Newtext.txt" | Select-String -Pattern 'H\|159' -NotMatch)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Samselvaprabu
  • 16,830
  • 32
  • 144
  • 230
  • 1
    This is exactly what I wanted. Thx @Samselvaprabu! – jyao Mar 25 '19 at 04:38
  • For large textfiles, this method is quite slow, Do you have any ideas how to improve the perfomance? – Gora Nov 15 '21 at 08:52
  • @Gora for debugging perf, simplifying the regex or using simpler string matching might help. Otherwise, I'd abandon powershell (who knows when powershell buffers vs. streams) and try native system commands like `grep` or `findstr.exe` which are plenty fast. – Carl Walsh Jul 27 '22 at 19:13
31

Escape the | character using a backtick

get-content c:\new\temp_*.txt | select-string -pattern 'H`|159' -notmatch | Out-File c:\new\newfile.txt
Fourkeys
  • 434
  • 3
  • 4
  • 8
    Warning - I used this to attempt to update a file in-place and the file was deleted. – alex Jun 14 '19 at 14:53
  • Long lines get sliced with `Out-File`, I resolved by using `Set-Content` instead, same syntax – Dariopnc Aug 20 '21 at 15:23
  • `Out-File` adds empty lines to the output (for non matching lines), but `Set-Content` doesn't, which I guess is indeed the desired behaviour. – Fuujuhi Mar 08 '22 at 12:31
7

Another option for writing to the same file, building on the existing answers. Just add brackets to complete the action before the content is sent to the file.

(get-content c:\new\sameFile.txt | select-string -pattern 'H`|159' -notmatch) | Set-Content c:\new\sameFile.txt
Robert Brooker
  • 2,148
  • 24
  • 22
  • In my tests, the backets do not change anything in the output produced, which makes sense. However using `Out-File` instead of `Set-Content` adds empty lines. – Fuujuhi Mar 08 '22 at 12:29
  • 1
    Thanks for the tip on `Out-File`, I have updated it to `Set-Content`. Without the brackets it would be writing to the file at the same time it is reading from it (in this one line example). The brackets force the read operation to complete before it starts writing to it. – Robert Brooker Mar 09 '22 at 09:15
  • 1
    Ok! This enforces sequential access, and emulates in-place edition of the file. Now it's clear, good tip! – Fuujuhi Mar 14 '22 at 21:37
  • 1
    Cleanest solution imho: read -> filter -> write, and applying parentheses to enforce the order of execution to be able to write to the _same_ file. Thanks a lot, just learned something new. – Ingmar Jan 27 '23 at 18:51
6

You don't need Select-String in this case, just filter the lines out with Where-Object

Get-Content C:\new\temp_*.txt |
    Where-Object { -not $_.Contains('H|159') } |
    Set-Content C:\new\newfile.txt

String.Contains does a string comparison instead of a regex so you don't need to escape the pipe character, and it's also faster

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 1
    I like this solution over Fourkeys' because (unless I'm an idiot) Select-String also adds file name and line number to the output, which isn't desired in my use case. – tolache Jul 05 '21 at 08:29
  • @tolache I don't see that behaviour with `Select-String` here (PS 5). – Fuujuhi Mar 08 '22 at 12:30
  • 2
    @Fuujuhi you don't get filenames and line numbers if you pass the input strings through a pipe like above, but normally `Select-String pattern file.txt` will output file name and line numbers by default as you can see from the [man page](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-string?view=powershell-5.1) – phuclv Mar 08 '22 at 13:12
4

The pipe character | has a special meaning in regular expressions. a|b means "match either a or b". If you want to match a literal | character, you need to escape it:

... | Select-String -Pattern 'H\|159' -NotMatch | ...
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • In PowerShell, the escape character is the backtick (`). See [About Escape Characters](http://technet.microsoft.com/en-us/library/hh847755.aspx). – orad Oct 21 '14 at 21:36
  • 5
    @orad I am aware of that. In regular expressions, however, the escape character is the backslash. Both work in this case. – Ansgar Wiechers Oct 22 '14 at 07:44
3

This is probably a long way around a simple problem, it does allow me to remove lines containing a number of matches. I did not have a partial match that could be used, and needed it to be done on over 1000 files. This post did help me get to where I needed to, thank you.

$ParentPath = "C:\temp\test"
$Files = Get-ChildItem -Path $ParentPath -Recurse -Include *.txt
$Match1 = "matchtext1"
$Match2 = "matchtext2"
$Match3 = "matchtext3"
$Match4 = "matchtext4"
$Match5 = "matchtext5"
$Match6 = "matchtext6"
$Match7 = "matchtext7"
$Match8 = "matchtext8"
$Match9 = "matchtext9"
$Match10 = "matchtext10"

foreach ($File in $Files) {
    $FullPath = $File | % { $_.FullName }
    $OldContent = Get-Content $FullPath
    $NewContent = $OldContent `
    | Where-Object {$_ -notmatch $Match1} `
    | Where-Object {$_ -notmatch $Match2} `
    | Where-Object {$_ -notmatch $Match3} `
    | Where-Object {$_ -notmatch $Match4} `
    | Where-Object {$_ -notmatch $Match5} `
    | Where-Object {$_ -notmatch $Match6} `
    | Where-Object {$_ -notmatch $Match7} `
    | Where-Object {$_ -notmatch $Match8} `
    | Where-Object {$_ -notmatch $Match9} `
    | Where-Object {$_ -notmatch $Match10}
    Set-Content -Path $FullPath -Value $NewContent
    Write-Output $File
}
Nuno Chaves
  • 176
  • 3
  • Thanks for the solution! It looks like you've missed the $Match1 declaration though, and in testing this locally it appears to be adding a blank line onto every file at the end – ChrisFletcher Apr 14 '23 at 14:25
  • Thank you for that, I had $Match2 twice, hence missing $Match1. As for the line at the end, not something I looked into as it does not affect the usability of my files. If I find a way of removing it I'll drop it in comments – Nuno Chaves Apr 15 '23 at 15:14
0

If you anyone having this issue while doing what suggested by Robert Brooker-

*These files have different encodings. Left file: Unicode (UTF-8) with signature. Right file: Unicode (UTF-8) without signature. You can resolve the difference by saving the right file with the encoding Unicode (UTF-8) with signature.* with Set-Content

use -Encoding UTF8

so like this

(get-content c:\new\sameFile.txt | select-string -pattern 'H`|159' -notmatch) | Set-Content c:\new\sameFile.txt -Encoding UTF8
HappyQuest
  • 27
  • 7
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 21 '22 at 22:04