0

Goal is: for each line in the log, there should be a document in elastic containing the 'message' (text after time stamp). Each document should also contain fields for the the project name, plan name, and build #. <--this is where I'm getting stuck

example log structure in the beginning (atlassian bamboo build logs):

simple 01-Jan-2016 14:26:01  Build TestProj - Framework Code - Build #25 (TST-FC-25) started building on agent .NET Core 2
simple 01-Jan-2016 14:26:01  .NET-related builds, tests and publishing.

I have a Grok to get and create the fields I want - build name, build number, and project name (and have them as fields in Kibana):

%{NOTSPACE:log_entrytype}%{SPACE}(?<timestamp>(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])-\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b-(?>\d\d){1,2}\s*(?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]))%{SPACE}Build%{SPACE}%{DATA:BamProjName}%{SPACE}-%{SPACE}%{DATA:BamBuildName}%{SPACE}-%{SPACE}Build%{SPACE}#%{NUMBER:BamBuildNum}

However I need these fields available in every record/entry in Kibana. With this other Grok, I can extract the other lines of the log into a log_message field:

grok {   [
           "message", "%{NOTSPACE:log_entrytype}%{SPACE}(?<timestamp>(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])-\b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b-(?>\d\d){1,2}\s*(?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9]))%{SPACE}%{GREEDYDATA:log_message}"
         ]
     }

So do I need to somehow combine these two pattern matches into one, using the 'optional': ()? syntax as described here?: link

Is my end goal achievable with logstash and the Grok plugin alone? Can I handle this with some type of variable construct within logstash? add_field?

***NOTE: using filebeat for shipping logs, and elastic does not recommend the multiline codec , so I'm curious what my other options are

baudsp
  • 4,076
  • 1
  • 17
  • 35
JohnZaj
  • 3,080
  • 5
  • 37
  • 51

1 Answers1

1

You need to work with multiline events, have a look at the official documentation: https://www.elastic.co/guide/en/logstash/current/multiline.html

whites11
  • 12,008
  • 3
  • 36
  • 53
  • since I'm using filebeat for shipping the logs, elastic does not recommend the multiline codec plugin. from that link: If you are using a Logstash input plugin that supports multiple hosts, such as the Beats input plugin input plugin, you should not use the Multiline codec plugin codec to handle multiline events. Doing so may result in the mixing of streams and corrupted event data. In this situation, you need to handle multiline events before sending the event data to Logstash. – JohnZaj Aug 30 '17 at 14:19
  • Ok, try having a look here then: https://www.elastic.co/guide/en/beats/filebeat/5.3/multiline-examples.html (no experience to share in this) – whites11 Aug 30 '17 at 14:20
  • will do. however w/Filebeat its not even clear if you can have more than one multiline.* expressed in the yml, as I already need one for consolidating multiline stack traces into one message. +1 for whoever points me to that documentation – JohnZaj Aug 30 '17 at 14:25
  • Not sure multiline events is the answer for this. Looking at using the ruby codec to set the field if it not empty from the grok where it looks for the log lines in question. – JohnZaj Aug 31 '17 at 01:00
  • Are events in the source file in different lines? If it is so I guess you have no choice. AFAIK any filter (included ruby) works on a single line and you have no access to other lines – whites11 Aug 31 '17 at 05:37
  • taking those two lines at the top of my post as the example, yes, some events fall on different lines (just this one for now, a second one down the road). With the ruby codec, I was thinking, using their Event API capabilities: capture the state after a successful grok which picks up the one/first line and creates fields from it. something like:@@projName = event['BamProjName'] . would likely fight through concurrency issues. And then the question still remains - one and only one grok? Or do I need two now - a second for the other 99.9% log lines, maybe that runs only if _groparsefailure – JohnZaj Aug 31 '17 at 13:44