1

I have an array consisting of URLS of the form:

$URLs = @("https://somesite.com/folder1/page/1/"
,"https://somesite.com/folder222/page/1/"
,"https://somesite.com/folder222/page/2/"
,"https://somesite.com/folder444/page/1/"
,"https://somesite.com/folder444/page/3/"
,"https://somesite.com/folderBBB/page/1/"
,"https://somesite.com/folderBBB/page/5/")

They always have /page/1/, I need to add (or reconstruct) all missing URLS from the highest page down to 1 so it ends up like so:

$URLs = @("https://somesite.com/folder1/page/1/"
,"https://somesite.com/folder222/page/1/"
,"https://somesite.com/folder222/page/2/"
,"https://somesite.com/folder444/page/1/"
,"https://somesite.com/folder444/page/2/"
,"https://somesite.com/folder444/page/3/"
,"https://somesite.com/folderBBB/page/1/"
,"https://somesite.com/folderBBB/page/2/"
,"https://somesite.com/folderBBB/page/3/"
,"https://somesite.com/folderBBB/page/4/"
,"https://somesite.com/folderBBB/page/5/")

I'd imagine the Pseudo-Code would be something like:

  • For each folder, extract the highest page number:

hxxps://somesite.com/folderBBB/page/5/

  • Expand this out from (5) to (1)

     hxxps://somesite.com/folderBBB/page/1/
      hxxps://somesite.com/folderBBB/page/2/
      hxxps://somesite.com/folderBBB/page/3/
      hxxps://somesite.com/folderBBB/page/4/
      hxxps://somesite.com/folderBBB/page/5/
    
  • Output this into an array

Any pointers would be welcome!

Bobby Tables
  • 111
  • 3
  • 12

1 Answers1

1

You can use a pipeline-based solution via the Group-Object cmdlet as follows:

$URLs = @("https://somesite.com/folder1/page/1/"
  , "https://somesite.com/folder222/page/1/"
  , "https://somesite.com/folder222/page/2/"
  , "https://somesite.com/folder444/page/1/"
  , "https://somesite.com/folder444/page/3/"
  , "https://somesite.com/folderBBB/page/1/"
  , "https://somesite.com/folderBBB/page/5/")

$URLs |
  Group-Object { $_ -replace '[^/]+/$' } | # Group by shared prefix
    ForEach-Object {
      # Extract the start and end number for the group at hand.
      [int] $from, [int] $to = 
        ($_.Group[0], $_.Group[-1]) -replace '^.+/([^/]+)/$', '$1'
      # Generate the output URLs.
      # You can assign the entire pipeline to a variable 
      # ($generatedUrls = $URLs | ...) to capture them in an array.
      foreach ($i in $from..$to) { $_.Name + $i + '/' }
    }

Note:

  • The assumption is that the first and last element in each group of URLs that share the same prefix always contain the start and end point of the desired enumeration, respectively.

    • If that assumption doesn't hold, use the following instead:

      $minMax = $_.Group -replace '^.+/([^/]+)/$', '$1' |
                  Measure-Object -Minimum -Maximum
      $from, $to = $minMax.Minimum, $minMax.Maximum
      
  • The regex-based -replace operator is used for two things:

    • -replace '[^/]+/$' eliminates the last component from each URL, so as to group them by their shared prefix.

    • -replace '^.+/([^/]+)/$', '$1' effectively extracts the last component from each given URL, i.e. the numbers that represent the start and end point of the desired enumeration.


Procedural alternative:

# Build a map (ordered hashtable) that maps URL prefixes
# to the number suffixes that occur among the URLs sharing
# the same prefix.
$map = [ordered] @{}
foreach ($url in $URLs) {
  if ($url -match '^(.+)/([^/]+)/') {
    $prefix, [int] $num = $Matches[1], $Matches[2]
    $map[$prefix] = [array] $map[$prefix] + $num
  }
}

# Process the map to generate the URLs.
# Again, use something like
#    $generatedUrls = foreach ...
# to capture them in an array.
foreach ($prefix in $map.Keys) {
  $nums = $map[$prefix]
  $from, $to = $nums[0], $nums[-1]
  foreach ($num in $from..$to) {
    '{0}/{1}/' -f $prefix, $num  # synthesize URL and output it.
  }
}
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Wow that looks brilliant, I'll give it a try today (and attempt to get my head round it!) and provide update/mark as answered. Many thanks indeed! :) – Bobby Tables Apr 22 '22 at 06:15
  • I'm glad that it worked, @BobbyTables; my pleasure. – mklement0 Apr 22 '22 at 15:17
  • Thanks once again that was incredibly succinct code and works a treat! Can I ask what resources you'd recommend to a PowerShell newbie; when I use PowerShell (I think of it as bash on steroids but object oriented too) it's like walking into a massive hardware store with thousands of tools and fittings, but not knowing which isle or shelf to approach for the job in hand. Cheers :) – Bobby Tables Apr 22 '22 at 15:28
  • My pleasure, @BobbyTables, and thanks for the nice feedback. A while back I compiled a few learning resources in [this answer](https://stackoverflow.com/a/48491292/45375) - I hope the list is still reasonably current. – mklement0 Apr 22 '22 at 15:34