-1

MAC OSX, PowerShell 6.1 Core

I'm struggling with creating the correct REGEX pattern to find a username string in the middle of a url. In short, I'm working in Powershell Core 6.1 and pulling down a webpage and scraping out the "li" elements. I write this to a file so I have a bunch of lines like this:

<LI><A HREF="/grouplist/expand-user/jimmysmith">Smith, Jimmy</A>&nbsp;

The string I need is the "jimmysmith" part, and every line will have a different username, no longer than eight alpha characters. My current pattern is this:

(<(.|\n)+?>)|(&nbsp;) 

and I can use a "-replace $pattern" in my code to grab the "Smith, Jimmy" part. I have no idea what I'm doing, and any success in getting what I did get was face-roll-luck.

After using several online regex helpers I'm still stuck on how to just get the "string after the third "/" and up-to but not including the last quote.

Thank you for any assistance you can give me.

JasonH
  • 23
  • 1
  • 4

3 Answers3

1

You could go super-simple,

expand-user/([^"]+)

Find expand-user, then capture until a quotation.

Evan Knowles
  • 7,426
  • 2
  • 37
  • 71
0
(?:\/.*){2}\/(?<username>.*)"

(?:\/.*) Matches a literal / followed by any number of characters

{2} do the previous match two times

\/ match another /

(?<username>.*)" match everything up until the next " and put it in the username group.

https://regex101.com/r/0gj7yG/1

Although, since each line is presumably identical up until the username:

$line = ("<LI><A HREF=\"/grouplist/expand-user/jimmysmith\">Smith, Jimmy</A>&nbsp;")
$line = $line.Substring(36,$line.LastIndexOf("\""))
dave
  • 62,300
  • 5
  • 72
  • 93
0

the answer is what was posted by Dave. I saved my scraped details to a file (the lines with "li") by doing:

get-content .\list.txt -ReadCount 1000| foreach-object { $_ -match "<li>"} |out-file .\transform.txt

I then used the method proposed by Dave as follows:

$a = get-content .\transform.txt |select-string -pattern '(?:\/.*){2}\/(?<username>.*)"' | % {"$($_.matches.groups[1])"} |out-file .\final.txt

I had to look up how to pull the group name out, and i used this reference to figure that out: How to get the captured groups from Select-String?

JasonH
  • 23
  • 1
  • 4