1

I'm trying to extract a single word from a line of text. As I understand it, Powershell regexes are almost the same as PCREs (and I have a way of easily testing PCREs). I have a file containing (amongst other things) something like...

ignore=thisline
username=symcbean
dontRead=thisEither

And I want to get the value associated with "username".

I know that the LHS of '=' will contain "username", optionally surrounded by whitespace, and the RHS will contain the value I am trying to extract (optionally surrounded by whitespace). The string I am looking for will match \w+, hence:

(?<=username=)\w+ 

works for the case without additional whitespace. But I can't seem to accommodate the optional white space. For brevity I've only shown the case of trying to handle a whitespace before the '=' below:

(?<=username\s*=)\w+   - doesn't match with or without additional space
(?<=username\W+)\w+    - doesn't match with or without additional space
(?<=username[\s=]*)\w+ - doesn't match with or without additional space

However in each case above, the group in the look-behind zero-width assertion (/username\s*=/, /username\W+/, /username[\s=]*/) matches the relevant part of the string.

I'm hoping to get a single value match (rather than array).

jessehouwing
  • 106,458
  • 22
  • 256
  • 341
symcbean
  • 47,736
  • 6
  • 59
  • 94
  • What's the powershell code you're using? And can you extend the list of examples with more examples of what you expect to match and not match, it sounds like there are a few permutations of whitespace and possibly other options you want to take into account. – jessehouwing Mar 19 '18 at 16:33
  • Guessing I'd expect: `(?<=username\s*=\s*)\w+ ` – jessehouwing Mar 19 '18 at 16:33
  • Powershell uses the same Regex library as .NET. The docs can be found here: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference – jessehouwing Mar 19 '18 at 16:34
  • I would just do this as `$word = ($line -split '=')[-1].Trim()`, where `$line` is your line of text, and `$word` is where your word ends up. (I don't have PCRE entirely memorized, so if `=` is a meaningful character, escape it `\=`). – Jeff Zeitlin Mar 19 '18 at 16:39
  • @jessehouwing: yes, I guessed that too - but we were both wrong :( – symcbean Mar 19 '18 at 16:41
  • Thanks to @Sweeper who suggested using \K - which worked brilliantly in my non-Powershell test rig - but thanks also to jessehouwig for pointing out that \K doesn't work in Powershell. I've voted to undelete Sweeper's answer as it may be helpful to those who have not yet succumbed to the dark side. – symcbean Mar 19 '18 at 16:47
  • 1
    Powershell uses .NET regex engine, definitely not PCRE. You should have tagged the question with `regex` tag, just did it for you. I removed the unnecessary PCRE tag. The correct answer is given by @HeedfulCrayon and in the @jessehouwing's comment. – Wiktor Stribiżew Mar 20 '18 at 07:22

2 Answers2

4

Meh, you could use regexes but then you would have two problems. This is how I would do it:

# Notice the extra spaces
$initialText = ' username = wombat  '
$userName = $intialText.Split('=')[1].Trim()

Here's how the key line works:

  • The Split() method takes the string $initialText, and divides it into an array, eliminating the character passed to the split (treating it as a delimiter). So now, you have an array @(' username ',' wombat ').
  • You then take the 1th (zero origin) element of the array ([1]). This is ' wombat '.
  • You then call the Trim() method, which gets rid of all the whitespace at the beginning and the end of the string - so you now have 'wombat'...
  • ...which you assign to $userName.

Split would still work you would just have to find the line that starts with username. Having said that, here is a regex method:

$initialText = ' username = wombat  '
$initialString -match '^.+=\W+(?<username>.+)\W+$'
$username = $matches.username

Or for an entire file:

From the prompt:

Get-Content C:\Path\To\Some\File.txt | %{if($_.trim().startswith('username')){$_ -match '^.+=\W?(?<username>.+)\W?$'; $username = $matches.username; $username}}

Or if you are doing it in a script:

$fileContents = Get-Content C:\Path\To\Some\File.txt
foreach($line in $fileContents){
  if($line.Trim().StartsWith('username')){
    $line -match '^.+=\W?(?<username>.+)\W?$'
    $userName = $matches.username
  }
}
$userName
EBGreen
  • 36,735
  • 12
  • 65
  • 85
  • Yup, that's essentially what I suggested in my comment; I didn't propose it as an actual answer in case there was some mandate for regex use. – Jeff Zeitlin Mar 19 '18 at 16:42
  • But how do I strip off the stuff before and after ' username = wombat ' ? – symcbean Mar 19 '18 at 16:42
  • You don't; you let `Trim()` do it for you. :) – Jeff Zeitlin Mar 19 '18 at 16:43
  • If it is whitespace then yes Trim() will do it. If it is something else then you shoulld have said that in the question. [GIGO](https://en.wikipedia.org/wiki/Garbage_in,_garbage_out) – EBGreen Mar 19 '18 at 16:44
  • (I'm not particularly precious about using a regex - that's how I would solve the problem elsewhere - but I'm still learning powershell and not yet found documentation on how it enumerates the match tree / how to dereference entries) – symcbean Mar 19 '18 at 16:49
  • Apologies @EBGreen, I thought I had explained this in the question - now updated to clarify. – symcbean Mar 19 '18 at 16:52
  • Wouldn't going line by line be a lot slower than doing something like `Select-String -allmatches`? – HeedfulCrayon Mar 19 '18 at 17:43
  • Meh...possibly. At the end of the day it is an admin scripting language. So whichever way you do it that gives a result that fits your needs is by my definition the right way to do it. :) – EBGreen Mar 19 '18 at 17:45
  • @EBGreen it can be more than just an admin scripting language though... I have created a whole application using powershell – HeedfulCrayon Mar 19 '18 at 20:03
  • @HeedfulCrayon That is good. In that case depending on what the app did, how it worked, what sort of environment it ran in, etc. optimizing it might be worth doing. I hold that most of the time though unless you have a problem optimization is not worth the effort until it is worth the effort. – EBGreen Mar 19 '18 at 20:07
  • @EBGreen In my opinion `select-string -allmatches` is much simpler than looping line by line.... and for me, I always try to approach an issue with simplicity first and then optimize it to the point where it isn't overly complicated, and in this case I don't think it is overly complicated – HeedfulCrayon Mar 19 '18 at 20:13
  • @HeedfulCrayon I think explicit foreach blocks are simpler than a long pipe with multiple ForEach-Objects. If it makes you feel better though I will acknowledge publicly that your answer is clearly better than mine and upvote it. – EBGreen Mar 19 '18 at 20:18
  • @EBGreen Sorry, I wasn't trying to attack or get public acknowledgement, we just don't know how many lines are in the OP's file they are reading from. You obviously have more experience than I do, and I wanted to know if there was a performance benefit to the way you did it – HeedfulCrayon Mar 19 '18 at 20:24
  • Pretty sure select string goes through the file one line at a time anyway. Never bench marked it though. – EBGreen Mar 19 '18 at 20:26
  • @EBGreen I just benchmarked it on a random 300KB file of mine. Testing for several matches within the text file, here are the results: **select-string -allmatches**: total milliseconds 43.1 **Get-Content - foreach line**: total milliseconds 245.54 This does, however utilize regex matching for both rather than the string methods you used above – HeedfulCrayon Mar 19 '18 at 20:57
  • Well there you go...see your answer is better. – EBGreen Mar 19 '18 at 21:01
1

This should do the trick if you are looking for multiple usernames in a single file. It will just put all values into an array of strings. The regular expressions pointed out should pull out what you want.

[regex]$username = "(?<=username\s*=\s*)\w+"
$usernames = @(Select-String -Path $file -Pattern $username -AllMatches | ForEach-Object {
    $_.Matches | ForEach-Object{
        $_.Value
    }
})

To explain a little bit of the Select-String commandlet, when you use the -AllMatches switch with it, it will return a collection of match objects. Inside those match objects are Matches, Groups, captures, etc. For this reason, you have to do the Foreach-Object { $_.Matches and then inside each matches object there is a value property hence | Foreach-Object { $_.Value

If it is only one username per file, you could just do this per file:

$text = get-content $file
[regex]$usernameReg = "(?<=username\s*=\s*)\w+"
$username = $usernameReg.Match($text).Value
HeedfulCrayon
  • 837
  • 6
  • 20
  • Oddly this regex works in Powershell, but not as a PCRE. – symcbean Mar 22 '18 at 10:49
  • @symcbean That's because .net uses it's own library for regex. See here for the differences in functionality https://stackoverflow.com/questions/26504263/what-is-pcre-compatible-syntax-and-is-c-sharp-pcre-compatible – HeedfulCrayon Mar 22 '18 at 14:48