Context
When performing filter operations it's generally best to push the filter as far upstream as possible to get good performance; e.g. if I were using PowerShell to get all files from a directory Get-ChildItem 'c:\temp\' -Filter '*.txt'
would be preferable to Get-ChildItem 'c:\temp\' | Where-Object {$_.Name -like '*.txt'}
.
However, in some situations the upstream component doesn't give us an option to push things upstream; e.g. if we wanted to find any image file we'd either have to make multiple calls to Get-ChildItem
passing different values to Filter
for each type, but resulting in traversing the directory multiple times and potentially returning the same files (if they match multiple filters); or we have to perform the filtering downstream.
If I were searching for image files (for this specifc example, lets say that's: '*.png', '*.gif', '*.jpg', '*.jpeg'
) one approach may be to send '*.*g*'
as the filter to the provider, so we elimiate a lot of candidates early on, then filter for the specific extensions we're interested in downstream.
Question
Is there a known method for extracting a "like pattern/mask" which represents the partial implementation of a regex?
e.g. so I could implement something like this:
Function Get-ImageFiles {
Param(
[Parameter(Mandatory)]
[string]$LiteralPath
,
[Parameter()]
[string]$Pattern = '\.(?:png|gif|jpg|jpeg)$'
)
$simpleMask = ConvertTo-SimpleMask -RegexPattern $Pattern
[System.IO.Directory]::EnumerateFiles($LiteralPath, $simpleMask) |
Select-String -Pattern $Pattern -Raw
}
# for '\.(?:png|gif|jpg|jpeg)$' simpleMask would be '*.*g*'
# for '\.(?:jpg|jpeg)$' simpleMask would be '*.jp*g'
# for '\.(?:png|gif|jpg|jpeg|webp)$' simpleMask would be '*.*'
Note: In this question I've used PowerShell for my example code; but I'm interested in any solution to this "regex to simple filter" problem. This is more a question of curiosity than specific to the above example use case.