2

I've been given the task of searching for SSNs (and other PII so we can remove it) in our entire file structure, fun I know. So far this script will search thru all .xlsx files in a given directory, but no matter what I try, I cannot for the life of me get the $SearchText variable to work. I have tried so many different deviations of the regex currently displayed, the only regex string that works is straight question marks; "???????????", but that returns entires I'm not looking for.

Any help would be very much appreciated.

Thanks!

$SourceLocation = "C:\Users\nick\Documents\ScriptingTest"
$SearchText2 = "^(?!(000|666|9))\d{3}-(?!00)\d{2}-(?!0000)\d{4}$"
$SearchText = "*"
$FileNames = Get-ChildItem -Path $SourceLocation -Recurse -Include *.xlsx

Function Search-Excel {
    $Excel = New-Object -ComObject Excel.Application
    $Workbook = $Excel.Workbooks.Open($File)
    ForEach ($Worksheet in @($Workbook.Sheets)) {
        $Found = $WorkSheet.Cells.Find($SearchText)
        If ($Found.Text -match "SearchText2") {
            $BeginAddress = $Found.Address(0,0,1,1)
            [pscustomobject]@{
                WorkSheet = $Worksheet.Name
                Column = $Found.Column
                Row =$Found.Row
                Text = $Found.Text
                Address = $File
            }
            Do {
                $Found = $WorkSheet.Cells.FindNext($Found)
                $Address = $Found.Address(0,0,1,1)
                If ($Address -eq $BeginAddress) {
                    BREAK
                }
                [pscustomobject]@{
                    WorkSheet = $Worksheet.Name
                    Column = $Found.Column
                    Row =$Found.Row
                    Text = $Found.Text
                    Address = $File
                }                 
            } Until ($False)
        }
        }
    }
    $workbook.close($false)
    [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$excel)
    [gc]::Collect()
    [gc]::WaitForPendingFinalizers()
    Remove-Variable excel -ErrorAction SilentlyContinue

foreach ($File in $FileNames) 
{
    Search-Excel
}

EDIT: Turns out excel has a very limited range of acceptable regex: Acceptable Excel Regex, so I modified the first $Searchtext viarable to just be "*", and the first if statement to match regex outside of excel's search. Now I just need to come up with a crafty regex pattern to filter what I want. The next problem is filtering:

  1. No letters.
  2. Valid SSNs with dashes.
  3. Valid SSNs without dashes. (this part is stumping me, how to search for something that can have dashes, but if it doesn't, it can only be 9 characters long)
  • 1
    Scraping / Farming Social Security Numbers ? `\d\d\d-\d\d-\d\d\d\d` – sln Jan 27 '22 at 19:34
  • That doesn't work either :( – Nick Wittkamp Jan 27 '22 at 19:35
  • Works for my SSN. I've never seen a SSN without dashes – sln Jan 27 '22 at 19:37
  • Just tried that (again), no go. If I had hair, I'd pull it out. – Nick Wittkamp Jan 27 '22 at 19:46
  • 1
    If the search text `"???????????"` works, Excel must not be using traditional regex, as that would either be invalid or match zero characters. That looks more like wildcard syntax. Can you find any documentation on how Excel is interpreting that string you’re passing to `Find`? I found [this documentation for `Range.Find`](https://learn.microsoft.com/en-us/office/vba/api/Excel.Range.Find), but I’m not sure if it’s the same method, and that page doesn’t describe how the `What` parameter is interpreted. – Rory O'Kane Jan 27 '22 at 19:50
  • 1
    Well at least `???-??-????` should work then using wildcards. Kind of an fyi https://forum.aspose.com/t/cells-find-doesnt-work-with-regex/71225 – sln Jan 27 '22 at 20:03
  • Further foundings https://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops – sln Jan 27 '22 at 20:08

1 Answers1

0

It definitely doesn't appear to be regex, but this did work with the dashes. The issues I see with your code is

  1. You define the searchtext outside of the function and don't pass it in
  2. Same with the file names
  3. Your workbook close, com release, gc, etc is outside of your function, so it won't do anything. (except maybe error?)

Here is what I got to work with your code. Now if you have other text that matches the pattern of 3 chars dash 2 chars dash 4 chars, you can easily filter those out afterwards with regex or whatever you like.

$SourceLocation = "C:\Users\nick\Documents\ScriptingTest"
$SearchText = "???-??-????"
$FileNames = Get-ChildItem -Path $SourceLocation -Recurse -Include *.xlsx

Function Search-Excel {
    [cmdletbinding()]
    Param($File,$SearchText)

    $Excel = New-Object -ComObject Excel.Application
    $Workbook = $Excel.Workbooks.Open($File)

    ForEach ($Worksheet in @($Workbook.Sheets)) {
        $Found = $WorkSheet.Cells.Find($SearchText)
        If ($Found) {
            $BeginAddress = $Found.Address(0,0,1,1)
            [pscustomobject]@{
                WorkSheet = $Worksheet.Name
                Column = $Found.Column
                Row =$Found.Row
                Text = $Found.Text
                Address = $File
            }
            Do {
                $Found = $WorkSheet.Cells.FindNext($Found)
                $Address = $Found.Address(0,0,1,1)
                If ($Address -eq $BeginAddress) {
                    BREAK
                }
                [pscustomobject]@{
                    WorkSheet = $Worksheet.Name
                    Column = $Found.Column
                    Row =$Found.Row
                    Text = $Found.Text
                    Address = $File
                }                 
            } Until ($False)
        }
    }

    $workbook.close($false)
    [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject([System.__ComObject]$excel)
    [gc]::Collect()
    [gc]::WaitForPendingFinalizers()
    Remove-Variable excel -ErrorAction SilentlyContinue
}

foreach ($File in $FileNames) 
{
    Write-Host processing $File.fullname
    Search-Excel -File $File.fullname -SearchText $SearchText
}

Output from test file

WorkSheet : Sheet1
Column    : 2
Row       : 5
Text      : 123-12-5555
Address   : C:\temp\excel2.xlsx

WorkSheet : Sheet1
Column    : 3
Row       : 21
Text      : 586-99-3844
Address   : C:\temp\excel2.xlsx

WorkSheet : Sheet1
Column    : 7
Row       : 28
Text      : 987-65-4321
Address   : C:\temp\excel2.xlsx
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13