0

I am using powershell to get the contents of a log file. I am trying to extract an unknown string between two known words. I need to do this for multiple lines so i want multiple strings searched and returned. I have seen many examples and tried different ways but they dont work.

I have used reg ex to narrow it down to the lines in the log i care about, but i am unable to extract the text i want.

$fails =  Select-String -Path 'C:\Users\user\Documents\wsyncmgr.log'  -Pattern "^(?=.*?\bError\b)(?=.*?\bSoftware\b)(?=.*?\bLicense\b)(?=.*?\bTerms\b)(?=.*?\bnot\b)(?=.*?\bdownloaded\b).*$"

this returns:

C:\Users\user\Documents\wsyncmgr.log:7340:Failed to sync update 817ad2a6-3ca7-4fa2-aa32-9b906a2d9fdc. Error: The Microsoft Software License Terms have not been completely 
downloaded and~~cannot be accepted. Source: Microsoft.UpdateServices.Internal.BaseApi.SoapExceptionProcessor.DeserializeAndThrow  $$<SMS_WSUS_SYNC_MANAGER><10-23-2019 
08:31:07.642+300><thread=5916 (0x171C)>
C:\Users\user\Documents\wsyncmgr.log:7341:Failed to sync update 87e13ecb-c669-43be-9e2a-01e567285031. Error: The Microsoft Software License Terms have not been completely 
downloaded and~~cannot be accepted. Source: Microsoft.UpdateServices.Internal.BaseApi.SoapExceptionProcessor.DeserializeAndThrow  $$<SMS_WSUS_SYNC_MANAGER><10-23-2019
08:31:07.643+300><thread=5916 (0x171C)>

etc..

i just want to extract update unique id so i can put them all into a variable and use later.

Closest i have got is $removeFirst = $fails -split "update " $removeLast = $removeFirst -split ". Error:" $removeLast[1]

C:\Users\user\Documents\wsyncmgr.log:7341:Failed to sync 
87e13ecb-c669-43be-9e2a-01e567285031
 The Microsoft Software License Terms have not been completely downloaded and~~cannot be accepted. Source: Microsoft.UpdateServices.Internal.BaseApi.SoapExceptionProcessor.DeserializeAndThrow  $$<SMS_WSUS_SYNC_MANAGER><10-23-2019 08:31:07.643+300><thread=5916 (0x171C)>
C:\Users\user\Documents\wsyncmgr.log:7342:Failed to sync
c1a1ec21-8efc-4cd4-8e85-90a03fc7b0c8
 The Microsoft Software License Terms have not been completely downloaded and~~cannot be accepted. Source: Microsoft.UpdateServices.Internal.BaseApi.SoapExceptionProcessor.DeserializeAndThrow  $$<SMS_WSUS_SYNC_MANAGER><10-23-2019 08:31:07.644+300><thread=5916 (0x171C)>
C:\Users\user\Documents\wsyncmgr.log:7343:Failed to sync
09dc7113-fa44-4ca8-9d70-ec254d4d2f04
 The Microsoft Software License Terms have not been completely downloaded and~~cannot be accepted. Source: Microsoft.UpdateServices.Internal.BaseApi.SoapExceptionProcessor.DeserializeAndThrow  $$<SMS_WSUS_SYNC_MANAGER><10-23-2019 08:31:07.644+300><thread=5916 (0x171C)>

but that only removes the words i specify and puts the rest on a separate line. then the array only returns the line i specify but i want multiple. I want to eliminate everything before "update " and everything after ". Error:" leaving only "09dc7113-fa44-4ca8-9d70-ec254d4d2f04" for each line.

any help would be appreciated im no good with regex

Nick Reed
  • 4,989
  • 4
  • 17
  • 37
  • Given the somewhat ambiguous title, it's worth mentioning: if you want `Select-String` to find _multiple_ (all) matches on a _single line_, add `-AllMatches`. – mklement0 Oct 23 '19 at 17:26

2 Answers2

1

If your IDs have the same structure, you can do the following:

$fails | Select-String '(?:[a-z0-9]+-){4}[a-z0-9]+' |
  ForEach-Object { $_.Matches.Value }

Explanation:

  • (?:[a-z0-9]+-) non-capturing group matching one or more alphanumeric characters and ending with a -
  • {4} matches four times
  • [a-z0-9]+ matches one or more alphanumeric characters.

Since there are multiple lines that will be captured, each matched line will be a MatchInfo object. Piping into Foreach-Object allows us to access each object as $_. The .Matches.Value will retrieve only the value of that object's match.

AdminOfThings
  • 23,946
  • 4
  • 17
  • 27
  • This does format it into two lines nicer but still leaves me with all the log line. Since it is a string a of numbers is there way to only include the string of numbers i want? `Failed to sync update 817ad2a6-3ca7-4fa2-aa32-9b906a2d9fdc. Error: The Microsoft Software License Terms have not been completely downloaded and~~cannot be accepted. Source: Microsoft.UpdateServices.Internal.BaseApi.SoapExceptionProcessor.DeserializeAndThrow $$<10-23-2019 08:31:07.642+300>` – KeetsScrimalittle Oct 23 '19 at 18:01
  • if `$fails` contains the output you listed under "this returns:", this should work and only display the IDs. – AdminOfThings Oct 23 '19 at 18:11
  • Yes sir thats exactly what it returns, and when i run your code, it returns what i put in the comments – KeetsScrimalittle Oct 23 '19 at 18:18
  • I see what you did, i think you crafted that bit for the second results i put in. where i used the "-spilt" paramater on the $fails variable. When i pipe your code to the $removefirst variable i created it works perfectly. thank you. – KeetsScrimalittle Oct 23 '19 at 18:19
  • My `$fails` is an array of strings separated by the newline characters. If `$fails` is a single string, you will need to add the `-AllMatches` parameter to `Select-String`. I basically copied and pasted what you provided. – AdminOfThings Oct 23 '19 at 18:22
  • It is probably all one long string since im using select-String -path -pattern "^(?=.*?\bError\b)(?=.*?\bSoftware\b)(?=.*?\bLicense\b)(?=.*?\bTerms\b)(?=.*?\bnot\b)(?=.*?\bdownloaded\b).*$" on a .log file and it is selecting only the lines with failures in $fails variable. when i run your code on the log file by itself it works like it should but it grabs every single update in the log. The fails variable narrows it down to only the lines that include errors. Even when i added -allmatches parameter it still returns everything i originally pasted in my reply. – KeetsScrimalittle Oct 23 '19 at 21:39
  • This solution is based on you already doing some filtering. If you want it to apply to the original file, you need to post example lines from the actual file – AdminOfThings Oct 23 '19 at 21:45
  • im fine with whatever works really but i cant paste that many lines. – KeetsScrimalittle Oct 23 '19 at 21:49
  • Skipped update c7596eab-52eb-43f3-9a0a-bc7b908a9683 - Security Update for Microsoft Office 2010 (KB4011610) 32-Bit Edition because it is up to date. $$<10-10-2019 11:58:45.151+300> Skipped update 8058136c-f813-48c1-b2af-18a8d2507f9b - Security Update for Microsoft Office 2010 (KB4011610) 64-Bit Edition because it is up to date. $$<10-10-2019 11:58:45.167+300>....... But all i care about is lines with – KeetsScrimalittle Oct 23 '19 at 21:54
  • ":Failed to sync update 09dc7113-fa44-4ca8-9d70-ec254d4d2f04. Error: The Microsoft Software License Terms have not been completely" just the uuid in between those failure lines. – KeetsScrimalittle Oct 23 '19 at 21:54
0

Use the [regex]::matches method on your string (you have to provide a regular expression to match on) then check the returned array for each specific match in the string. Here's an example:

$myString = "The quick brown fox"
$myMatches = [regex]::matches($myString, "\w+")
$myMatches.Value

The example above looks for words in the string. The matches method matches multiple times, whereas the -match operator doesn't have a global match option (that I could find, would love to be proven wrong here). The .Value property contains the actual match, though there are also other useful members you can use on the System.Text.RegularExpressions.Match object as well.

EDIT

Your question was a bit ambiguous:

  • I am trying to extract an unknown string between two known words.
  • I just want to extract update unique id

I missed the second part, so let me address that here. Since you want to pick a known pattern (a unique ID) from a string, you can use the -match operator for this:

$fails | Foreach-Object {
  if( $_ -match '[a-z0-9]{8}-([a-z0-9]{4}-){3}[a-z0-9]{12}') ){
    $matches[0]
}

$matches is a special array that, when a match is made with the -match operator, the match is placed into the 0 index of $matches. Each index past that indicates a capturing group match, but you don't need that here.

The expression I provided is for a well-formed UUID, should work with both the MS regex engine as well as PCRE and Javascript. Of course, make sure you use a case-insensitive match, or add the capital variants if required in other cases.

codewario
  • 19,553
  • 20
  • 90
  • 159
  • This one just grabs everything your search for from what i can tell. I dont know the exact string i want to include. just everything i want to exclude. – KeetsScrimalittle Oct 23 '19 at 17:59
  • I provided an example expression. You can plug your own expression in instead. You may need to use a non-capturing group if you want to match only after or before specific other patterns. – codewario Oct 23 '19 at 18:15
  • I updated my answer with more information on extracting a UUID from a string. – codewario Oct 23 '19 at 18:52
  • What is non-capturing group? – KeetsScrimalittle Oct 23 '19 at 21:41
  • They are used to match patterns that come before or after other patterns. [Here is an SO question that explains groups in more detail](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions). – codewario Oct 24 '19 at 13:12