0

I queried the registry to get a file path I am looking for. However, I need to go one directory lower to retrieve some file info I need. The pattern I am trying to match against is Officexx or OFFICExx. I can't seem to get the path I need.

Found path from registry: C:\Program Files\Microsoft Office

What I need is: C:\Program Files\Microsoft Office\Officexx

Code:

$base_install_path = "C:\Program Files\Microsoft Office";
$full_install_path = $base_install_path+'\Office[\d+.*]'
Write-Output $full_install_path;  

This returns:

C:\Program Files\Microsoft Office\Office[\d+.*] 

Desired output:

C:\Program Files\Microsoft Office\Office15

Not this could be any two digit # ^^

cynicalswan77
  • 239
  • 2
  • 12
  • 1
    `$base_install_path+'\Office[\d+.*]'` is a string concatenation, have you tried searching the path with `Get-ChildItem` ? That should be part of your code. – Santiago Squarzon Jan 07 '22 at 22:36

2 Answers2

3

Building on Santiago Squarzon's helpful comment:

# Find all child directories matching the given wildcard pattern, if any.
Get-ChildItem -Directory -Path "$base_install_path\Office[0-9][0-9]*"
  • Unlike POSIX-compatible shells such as bash, PowerShell does not support automatic globbing of unquoted strings (pattern matching against file names, known as filename expansion) and instead requires explicit use of the Get-ChildItem or Get-Item cmdlets; e.g., the equivalent of bash command pattern='*.txt'; echo $pattern in PowerShell is $pattern='*.txt'; Get-ChildItem -Path $pattern

    • Note that objects describing the matching files or directories are output by these cmdlets; use their properties as needed, e.g. (Get-ChildItem $pattern).Name or (Get-ChildItem $pattern).FullName (full path). Use Get-ChildItem $pattern | Get-Member -Type Properties to see all available properties.
  • The -Path parameter of these cmdlets expects a PowerShell wildcard expression to perform the desired matching, and the expression in the command at the top matches exactly two digits ([0-9][0-9]), followed by zero or more characters (*), whatever they may be (potentially including additional digits).

    • Note: Only PowerShell's wildcard language - as accepted by the -Path and -Include / -Exclude parameters (and in many other contexts) - supports character ranges (e.g. [0-9] to match any decimal digit) and sets (e.g. [._] to match either . or _). By contrast, Get-ChildItem's -Filter parameter uses the wildcard language of the file-system APIs (as cmd.exe does), which does not support them, and additionally exhibits legacy quirks - see this answer for more information.

    • While PowerShell's wildcard character ranges and sets fundamentally work the same as in regexes (regular expressions, see below), regex-specific escape sequences such as \d are not supported, and you generally cannot quantify them; that is, something like [0-9] only ever matches exactly one digit.


Given that wildcard patterns support only one, non-specific duplication construct, namely the aforementioned *, matching a specific range of digits - such as 1 or 2 at most or a specific count - such as exactly two - requires post-filtering based on a regex (which is what you tried to use):

# Find all child directories matching the given regex, if any.
# Matches 'Office' at the start of the name (^),
# followed by 1 or 2 ({1,2}) digits (\d), 
# followed by at least non-digit (\D), if any (?)
Get-ChildItem -Directory -LiteralPath $base_install_path |
  Where-Object Name -match '^Office\d{1,2}\D?'

As for what you tried:

  • [\d+.*] is a regex, but you probably meant \d+.*, i.e. one or more (+) digits (\d) followed by zero more (*) characters, whatever they may be (.)

  • Inside a character-range/set expression ([...]), +, . and * are used verbatim, i.e. they are not metacharacters and match literal . and * characters.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Many thanks for your explanation. I thought there were only two (2) expression types: 1) "wildcard" (cmd '*', '.', and '?') and 2) "regex". It now appears there are three (3) types: 1) "wildcard" (cmd), 2) "PSWildcard" (adding explicit character ranges), and 3) "regex". Would you say that is a correct view? – lit Jan 08 '22 at 16:47
  • @lit, yes - you can think of it as two wildcard "dialects", with the PowerShell one having more features and not being subject to legacy behavior. Also, the PowerShell dialect is used pervasively throughout PowerShell, not just in the context of the file-system. Many `-Name` parameters support wildcards; for instance, `Get-Process -Name ba*` finds all processes whose names start with `ba`. One downside with respect to file-system use is that `[` and `]` are legitimate characters in file names, requiring you to use the `-LiteralPath` parameter explicitly to match them _verbatim_. – mklement0 Jan 08 '22 at 17:02
  • Based on this, I wonder if a terminology distinction should be made. The output of `help Get-Process -Parameter Name` mentions "wildcard characters." However, it is not clear if this is traditional `cmd` wildcard characters or PSWildcard. – lit Jan 08 '22 at 19:01
  • @lit, the only context in which the file-system wildcard dialect is spoken is the `-Filter` parameter of `Get-ChildItem`. The [docs](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-childitem?view=powershell-7.1&WT.mc_id=ps-gethelp#parameters) currently state, "The filter string is passed to the .NET API to enumerate files. The API only supports `*` and `?` wildcards." (This lacks details, but conveys the most important distinction). In other words: wildcards in any other context refer to the PowerShell dialect. – mklement0 Jan 08 '22 at 19:47
  • This works, but how would I get it to print just the directory, like so: `C:\Program Files\Microsoft Office\Officexx` – cynicalswan77 Jan 10 '22 at 17:56
  • 1
    @cynicalswan77, you mean you just want the full path _strings_? `Get-ChildItem -Directory -Path "$base_install_path\Office[0-9][0-9]*" | ForEach-Object FullName` or, as an expression: `(Get-ChildItem -Directory -Path "$base_install_path\Office[0-9][0-9]*").FullName` – mklement0 Jan 10 '22 at 18:27
1
Get-ChildItem -Path 'C:\Program Files\Microsoft Office\' -Directory | 
    Where-Object { $_.Name -match 'Office\d+' }

In your regex, [] is a character class which means [\d+.*] is not "one or more numbers" it's "a backslash OR d OR plus OR dot OR asterisk".

PS C:\> "d+\" -match "[\d+]"
True

Not what you were looking for.

TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
  • Good point re character class, but note that escape sequence `\d` _is_ recognized as such - try `'1' -match '[\d]'` – mklement0 Jan 07 '22 at 23:36