2

Fairly new to PowerShell and am looking to figure out how to find a name of a directory that matches multiple criteria from an array list.

The below code is what has been developed so far:

    $InputAddress = Read-Host -Prompt 'Enter in a job address'
    # Remove any non alphanumeric characters and white spaces
    $InputAddress = ($InputAddress -replace '[^a-zA-Z\d\s]') 
    # Remove any double spaces and split to separate into an array
    $InputAddress = [System.Collections.ArrayList]($InputAddress -replace '  ',' ').split(" ")
    while ($InputAddress -contains "USA")
    {
        $InputAddress.Remove("USA")
    }

    $srcRoot = (Get-ChildItem -Path 'C:\Users\' -Filter "Box" -Recurse -Directory -Depth 1 -ErrorAction:Ignore).Fullname+'\'
    $JobRoot = (Get-ChildItem -Path $srcRoot -Filter "*Backup*" -Recurse -Directory -Depth 0).Fullname

    $JobFolder = (Get-ChildItem -Path $JobRoot -Filter "$InputAddress" -Recurse -Directory -Depth 0)
    Write-Output $JobFolder

The address that is inputted from Read-Host is "1234 street - cityname, CA USA", which when it goes through the script creates an array list of:

1234
street
cityname
CA

The intended folder that would need to be found is named "1234 street, cityname, CA - ABC Company"

Some background info is PowerShell 5.1 is being used, Windows 10 OS.

Could someone help me understand how to include all criteria when performing the Get-ChildItem? My current best guess is that either wildcards * are needed, or -and type to include all items.

Edit: Some more background regarding the folder structure of all the folders that are named as addresses-is that all folders start with the number, then the street name, then sometimes the state/province. If the street number narrows down the amount of folders, and then the street name narrows it down to one folder, this would be ideal. Basically each ordered element could narrow the search further until only one folder is returned.

Screamcheese
  • 105
  • 1
  • 10
  • 1
    Does it have to match __all elements__ in the ArrayList or can be matched partially? – Santiago Squarzon Feb 02 '22 at 21:59
  • @SantiagoSquarzon, one answer to your question: if it can match ***as many elements out of the total number of elements***, then that would be okay with that as a partial match. i.e. 75% partial match of the total arraylist. If you are talking about each element is a partial match, such as "123" matches and is found in the string "1234", that was the original goal of wanting to find but for all of the elements in the arraylist. – Screamcheese Feb 02 '22 at 22:09
  • What about using Get-ChildItem -Filter to get everything starting with the number - something like "$number*" or "$($number)*", but then use pipeline to pass to Where-Object { $_.Name -match 'Regular Expression' }? My thinking is that the filter in Get-ChildItem will be much faster than Regular Expressions are, but if RegEx only has to process a handful of remaining folder names - it shouldn't be an issue. You can go to regex101.com and modify this regular expression until does the job: ^\s*1234.*\sstreet.+cityname.+CA\s.+$ – Darin Feb 03 '22 at 04:29
  • 1
    I think you want a really fancy regex query. Upside is it would be super fast as you'd just need one -match operator and no nested loops. Downside is you'd need to sanity check your user input to make sure all special chars are properly escaped and formulating said query would be a complicated bit of code. See this post for some inspiration. https://stackoverflow.com/questions/3041320/regex-and-operator – mOjO Feb 03 '22 at 09:03
  • I have question on performance. Do you think you will have something large, as in 10,000 addresses/folders? My network knowledge is limited to my experience with VPN from home to work and the LAN at work. But I can tell you that if I was home, mapped a network drive, and did a DIR piped into a text file in a command prompt of a folder with 10,000 items in it, it wouldn't be fast. If you hit performance issues, and worst comes to worst, you may have a need to do something along the lines of using Enter-PSSession to connect to virtual server your ITS setup. – Darin Feb 03 '22 at 12:32
  • I agree with mOjO on the input being a big problem. You can spit the $InputAddress by spaces and assign it to 5 variables [$A, $B, $C, $D, $E = $InputAddress -split '\s+'], then in good user input only $A~$D will have content and $E will be empty. But you will have boat load of other problems. Consider Kansas City or Los Angeles. Two word city names with a space in them. I don't think your current code address that small problem, let alone the user who was up till 3 AM with a sick child and lost their glasses. – Darin Feb 03 '22 at 12:42
  • Take a look at the answer on this one: https://stackoverflow.com/questions/11160192/how-to-parse-freeform-street-postal-address-out-of-text-and-into-components – Darin Feb 03 '22 at 14:56
  • @Darin, the answer from stackoverflow you linked is hilarious but extremely informative. Thankfully I believe my goal is simpler. The address strings are not needed to "map" any sort of address but to find, match, and return a folder that matches as many significant elements as possibly to determine it is the appropriate return. So formatting the query is mainly to get to the point where a `split` sets of strings from the address; i.e. removing any character's such as -, commas, periods, etc., and also removing tags that will 99% of the time never been found within the query i.e. USA – Screamcheese Feb 03 '22 at 18:59

2 Answers2

1

Assuming $JobFolder is the array containing all the folders and $InputAddress is the collection of all of elements we want to compare against each folder, this is the logic I would use:

foreach($dir in $JobFolder) {
    $skip = foreach($word in $InputAddress) {
        if($dir.Name -notmatch $word) {
            $true
            break
        }
    }
    if(-not $skip) { $dir }
}
  1. Outer Loop: Enumerate the Directories.
  2. Inner Loop: Enumerate the words we need to match.
  3. Inner Condition: If the Name of the Folder does not match with any of the words we need to match, break the inner loop and return $true.
  4. Outer Condition: If the returned value from the inner loop negated is $true (in other words, if the inner loop returned $null, which when negated is $true), return the DirectoryInfo object, else, go to the next iteration.

As you can tell, this code will only work with a 100% match of the elements on $InputAddress.

Note, this is doing the comparison against the folder's Name, if you need to do the comparison against against all the parent folders you would use $dir.FullName instead of $dir.Name.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • SantiagoSquarzon, I'm trying to understand, what does `$skip = foreach($word in $InputAddress){ # other functional code here }` do? After some thinking, I'll slightly change the goal that might make this challenge a little easier. The suggested logic did not return anything, which the `$JobFolder` and `$InputAddress` was verified with `Write-Output`. – Screamcheese Feb 03 '22 at 02:28
  • @Screamcheese that's explained in point 3 – Santiago Squarzon Feb 03 '22 at 02:30
  • SantiagoSquarzon, I updated my response right above yours and the original question towards the bottom to hopefully make it easier for the search. Currently the logic is not returning anything even though the `$JobFolder` and `$InputAddress` were verified. – Screamcheese Feb 03 '22 at 02:46
  • @Screamcheese try with `$dir.FullName` instead of `$dir.Name` from my code – Santiago Squarzon Feb 03 '22 at 02:50
  • SantiagoSquarzon, that is correct "assuming `$JobFolder` is the array containing all the folders and `$InputAddress` is the collection of all of elements we want to compare against each folder". To add to this, this is at a depth level of 0. So I'm still not sure why `$dir.FullName` or `$dir.Name` is not returning anything. To me, it seems that line `$skip = foreach($word in $InputAddress({` would never return `$null` if any other folder does not match all of the `$InputAddress` elements. – Screamcheese Feb 03 '22 at 03:12
0

Wouldn't structuring your data be simpler?

It seems to me that the only use case for this data is to store addresses, find/retrieve addresses, and maybe find addresses that are near each other. In these cases, a folder structure similar to

D:\Addresses\{States}\{Cities|zips}\{Streets}\{Files representing each address}

would seem to be reasonable solution.

So, if your data is in CSV files, and using the example address, its file's path would be

D:\Addresses\CA\cityname\street\1234.CA.cityname.street.csv

NOTE: The final file will still have the full address, that way if somehow it gets moved, you don't have to struggle to figure out where it came from. And placing the number part at the front of the file's name will probably make it easier to find both visually and by code.

Darin
  • 1,423
  • 1
  • 10
  • 12
  • Darin, structuring the data would require thousands of folders, files, and memory to be manipulated over an internet connection in addition to company policy's of proposing a rearrangement of the folder structure. The reason for the structure the way it currently is, is so it is "easier" to find the folder named as an address, starting with first the number. – Screamcheese Feb 03 '22 at 02:26
  • Screamcheese, thank you for the reply. I'm not sure the manipulation over the internet has to be a problem. I have a friend who does a lot of work like that by telling JavaScript on the server what he wants, it does the work and gives him back the answers he wants. But if company policies' are involved, it really doesn't matter if there is a way to do it. Hope you can find a solution that works really well. – Darin Feb 03 '22 at 03:19