Parse data from log file using Regex patterns

Question

I have a log file full of logs of this type :

2020-02-04 04:00:31,503 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceType] [ObjectType] - Information about the log

I would like, using regex patterns, to retrieve the time, the last text in brackets ([ObjectType] in the exemple) and the information message after the hyphen.

Example of Input :

2020-02-04 04:00:33,435 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceTypeJohn] [ObjectTypeJohn] - Information about the John log
2020-02-04 06:50:34,465 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceTypeBob] [ObjectTypeBob] - Information about the Bob log
2020-02-04 07:20:34,677 [z4y6480f-214b-4253-9223-n02542f706ac] [INFO] [ServiceTypeSam] [ObjectTypeSam] - Information about the Sam log

Desired output :

04:00:33,435 [ObjectTypeJohn] - Information about the John log
06:50:34,465 [ObjectTypeBob] - Information about the Bob log
07:20:34,677 [ObjectTypeSam] - Information about the Sam log

So far I have tried this but didn't succeed :

(Get-Content Output.txt) -replace '^(\d\d:\d\d:\d\d).*(\[.*?\] - .*?)$','$1;$2'

Would appreciate any help on this, thanks.

score 2 · Accepted Answer · answered Mar 11 '20 at 14:02

2

You may use

(Get-Content Output.txt) -replace '^\S+\s+(\S+).*(\[[^][]*])\s*(-.*)', '$1 $2 $3'

See the .NET regex demo

Details

^ - start of the string
\S+ - 1+ chars other than whitespace
\s+ - 1+ whitespaces
(\S+) - Group 1: 1+ chars other than whitespace
.* - any 0+ chars other than newline, as many as possible
(\[[^][]*]) - Group 2: [, 0+ chars other than [ and ] and then a ] char
\s* - 1+ whitespaces
(-.*) - Group 3: - and the rest of the string.

Demo results:

answered Mar 11 '20 at 14:02

Wiktor Stribiżew

607,720
39
448
563

Thanks works perfectly! Could you please also show how to retrieve the brackets of [ServiceType] instead of [ObjectType] ? – Kimo Mar 11 '20 at 14:18
1

@Kimo [Like this](http://regexstorm.net/tester?p=%5e%5cS%2b%5cs%2b%28%5cS%2b%29.*%28%5c%5b%5b%5e%5d%5b%5d*%5d%29%5cs%2b%5c%5b%5b%5e%5d%5b%5d*%5d%5cs%2b%28-.*%29&i=2020-02-04+04%3a00%3a33%2c435+%5bz4y6480f-214b-4253-9223-n02542f706ac%5d+%5bINFO%5d+%5bServiceTypeJohn%5d+%5bObjectTypeJohn%5d+-+Information+about+the+John+log&r=%241+%242+%243&o=m). – Wiktor Stribiżew Mar 11 '20 at 14:19

mklement0 · Answer 2 · 2020-03-11T14:21:44.823

As an alternative to a regex solution, consider use of the unary form of the -split operator, which makes for a conceptually simpler solution:

(Get-Content Output.txt).ForEach({ 
  # Split line into an array of fields by whitespace.
  $fields = -split $_ 
  # Extract the fields of interest by index and re-join with spaces.
  $fields[1, 5 + 6..($fields.Count-1)] -join ' ' 
})

The unary form of -split behaves similar to the Unix awk utility, in that it tokenizes a line by any runs of non-empty whitespace, ignoring leading and trailing whitespace).

Note that the solution above relies on the fields before the - not containing whitespace themselves, which is true for the sample input.

Parse data from log file using Regex patterns

2 Answers2