0

I'm running the following script to check a group of files for card numbers. When I run it against a group of 38 files that are a total of 600mb, it consumes max cpu (50% restricted) and max memory (3.3GB of 4.0GB physical).

Looking for ideas on why this may be and how to optimize this.

Thanks!

Get-ChildItem "c:\REGEX\ScanMeFiles\" -Recurse |`
Foreach-Object{
    $content = Get-Content $_.FullName
    $outfile = 'c:\regex\results\'+$_.BaseName+'_results.log'
 $content | Where-Object {$_ -match '\b(?:3[47]\d|(?:4\d|5[1-5]|65)\d{2}|6011)\d{12}\b'}  | Set-Content $outfile
}
  • 1
    You have back-references inside of back-references. Maybe it's catastrophic backtracking? Try a simpler regex, one that uses less look around. – jpmc26 Mar 03 '15 at 20:47
  • So you are looking for Visa, MasterCard, AMEX and Discover yes? It looks like you dont care about the presence of hypens or spaces. Is that correct? If so we could remove the `?:`'s. Also you could look into making multiple backround jobs and limit the amount of jobs running at once like this [answer](http://stackoverflow.com/questions/15580105/powershell-run-multiple-jobs-in-parralel-and-view-streaming-results-from-backgr) ( There are others. ). `Select-String` can process files for patterns as well. Not sure off hand if it is more efficient than what you already have. – Matt Mar 03 '15 at 21:16
  • Hi jpmc26, thanks, but I'm not sure it's the regex that is the problem. First, I should say that everything in that code is copied from the 4 corners of the internet (regex and powershell scripting are foreign languages to me). If I run that regex against a single 3GB file it runs pretty quickly (10 min or so) while running it as I described against multiple files totaling 600mb, I had to kill the process after 50 min. So I'm thinking it's something in the code part. Thanks. – user3757985 Mar 03 '15 at 21:25
  • Hi Matt, you are correct. I don't think I want parallel processing, as it's already consuming max cpu and memory. Not sure how the code i have is running as I just smooshed that code together from various sources. I think you might be right with the Select-String. I'll give that a try. thanks! – user3757985 Mar 03 '15 at 21:31
  • Matt, select-string was the answer! Here is what I wound up with (not sure how to format this to make it intelligible...): Get-ChildItem "c:\REGEX\ScanMeFiles\" | Foreach-Object{ $content = $_.FullName $outfile = 'c:\regex\results\'+$_.BaseName+'_results.log' $regex = '\b(?:3[47]\d|(?:4\d|5[1-5]|65)\d{2}|6011)\d{12}\b' select-string -Path $content -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } | Set-Content $outfile Thanks for your help!!! – user3757985 Mar 03 '15 at 21:56

2 Answers2

0

I would make it a little more contained. Do something like this with fewer variables:

$children = (Get-ChildItem).FullName
foreach($child in $children){
    Get-Content $child | ?{$_ -match '\b(?:3[47]\d|(?:4\d|5[1-5]|65)\d{2}|6011)\d{12}\b'} | Set-Content ('c:\regex\results\'+$_.BaseName+'_results.log')
}
Little King
  • 1,030
  • 1
  • 8
  • 14
  • While this makes the code more terse I don't see it having any tangible effect on memory consumption. OP might need to consider File Streams – Matt Mar 03 '15 at 21:17
  • It makes a significant difference. I learned this when automating process against tens of thousands of mailbox and MSOL objects. His approach takes all objects and stores them in RAM and processes the entire object one by one, which is going to use more memory than grabbing only the Full Name as a string, not the entire object, and processing these one at a time with the least amount of information required. – Little King Mar 04 '15 at 13:47
0

With Matt's help, this is what I came up with. Runs in <1 minute against my test data. thanks!

Get-ChildItem "c:\REGEX\ScanMeFiles\" |
Foreach-Object{
    $content = $_.FullName
    $outfile = 'c:\regex\results\'+$_.BaseName+'_results.log'
    $regex = '\b(?:3[47]\d|(?:4\d|5[1-5]|65)\d{2}|6011)\d{12}\b'
select-string -Path $content -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } | Set-Content $outfile