3

I am trying to parse a log4net file into json.

Here's my sample log4net:

2015-01-27 01:06:18,859 [7] ERROR Web.Cms.Content.Base.Taxonomy.TaxonomyDetectionProvider [(null)] - Get taxonomy Type Failed for Tools
2015-01-27 06:34:31,051 [26] ERROR www.Status404 [(null)] - ErrorId: 20150127_102b01c6-3208-48c5-8c8b-ae4f92cf2b20
    UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36
    HostAddress: 192.168.10.2
    RequestUrl: /ErrorPages/404.aspx
    MachineName: QA01
    Raw Url:/undefined/
    Referrer: http://qa1.www.something.com/toolset.aspx

2015-01-27 06:34:33,270 [26] DEBUG Web.Caching.Core.CacheManagerBase [(null)] - Custom CacheProvider:Web.Caching.Core.AppFabricCacheManager,Web.Caching.Core Disabled

With this I use xm_multiline to capture each log entries.

<Extension multiline>
    Module        xm_multiline
    HeaderLine    /^\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2},\d{3}/
    EndLine       /\r?\n\r?\n^\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2},\d{3}/
</Extension>

I use a regex to capture the timestamp as the header then I use a regex to capture twice newline then the next timestamp as the endline. However it still treat the second and last entries as ONE log entry.

Here's the output:

{  
   "EventReceivedTime":"2015-01-27 01:06:35",
   "SourceModuleName":"log4net",
   "SourceModuleType":"im_file",
   "time":"2015-01-27 01:06:18,859",
   "thread":"7",
   "level":"ERROR",
   "logger":"Web.Cms.Content.Base.Taxonomy.TaxonomyDetectionProvider",
   "ndc":"(null)",
   "message":"Get taxonomy Type Failed for Tools"
}{  
   "EventReceivedTime":"2015-01-27 06:34:35",
   "SourceModuleName":"log4net",
   "SourceModuleType":"im_file",
   "time":"2015-01-27 06:34:31,051",
   "thread":"26",
   "level":"ERROR",
   "logger":"www.Status404",
   "ndc":"(null)",
   "message":"  ErrorId: 20150127_102b01c6-3208-48c5-8c8b-ae4f92cf2b20\r\n  UserAgent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36\r\n  HostAddress: 192.168.10.2\r\n  RequestUrl: /ErrorPages/404.aspx\r\n  MachineName: QA01\r\n  Raw Url:/undefined/\r\n  Referrer: http://qa1.www.something.com/toolset.aspx\r\n\r\n2015-01-27 06:34:33,270 [26] DEBUG Web.Caching.Core.CacheManagerBase [(null)] - Custom CacheProvider:Web.Caching.Core.AppFabricCacheManager,Web.Caching.Core Disabled"
}

I used this to produce that output:

Exec    if $raw_event =~ /^(\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2},\d{3}) \[(\S+)\] (\S+) (\S+) \[(\S+)\] \- (.*)/s \
        { \
            $time = $1; \
            $thread = $2; \
            $level = $3; \
            $logger = $4; \
            $ndc = $5; \
            $message = $6; \
            to_json(); \
        } \
        else \
        { \
            drop(); \
        }

I've also tried to tweak it by using this to avoid the combining the last two entries as one. However I am not able to get the last entry anymore.

Exec    if $raw_event =~ /^(\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2},\d{3}) \[(\S+)\] (\S+) (\S+) \[(\S+)\] \- ([\s\S]*?)(\r?\n\r?\n|$)/ \
        { \
            $time = $1; \
            $thread = $2; \
            $level = $3; \
            $logger = $4; \
            $ndc = $5; \
            $message = $6; \
            to_json(); \
        } \
        else \
        { \
            drop(); \
        }
samy
  • 14,832
  • 2
  • 54
  • 82
Nataraki
  • 97
  • 1
  • 10

2 Answers2

1

I would not bother trying to parse your log into JSON. Rather you should produce JSON directly. There are some appenders that you can use directly to do that, such as log4net.ext.json:

Extend log4net facility with simple configuration options to create JSON log entries. This is especially handy to pass semantic information to other utilities, such as nxlog, LogStash, GrayLogs2 and similar.

(emphasis mine)

If you need a human readable version of the log you can create two loggers which each output one format, but I'm guessing you'll be using nxlog for that anyway.

In my opinion the regex is not a very good way to push back from freeform log to structured log so you may as well structure it directly.

samy
  • 14,832
  • 2
  • 54
  • 82
  • Thank you very much for your response. I have considered that way in the first place. However, I am thinking that as much as possible I wouldn't modify the log4net config and stay as it was setup. – Nataraki Jan 29 '15 at 13:12
  • Did you manage to keep your log4net configuration and parse the output? – samy Mar 12 '15 at 10:19
  • Unfortunately I got no progress yet. – Nataraki Mar 22 '15 at 07:10
1

I work on a similar problem. I think you need to delete the EndLine parameter in:

<Extension multiline>
    Module        xm_multiline
    HeaderLine    /^\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2},\d{3}/
</Extension>

Because EndLine is part of the message. It is what I understand reading the doc here: http://nxlog-ce.sourceforge.net/nxlog-docs/en/nxlog-reference-manual.html#xm_multiline

EndLine

This is similar to the HeaderLine directive. This optional directive also takes a string or a regular expression literal to be matched against each line. When the match is successful the message is considered complete and is emitted.

The first message is well interpreted because the parser has found again HeaderLine so he closes the first message.

As you can read in the same doc:

Until there is a new header read, the previous message is stored in the buffers because the module does not know where the message ends. The im_file module will forcibly flush this buffer after the configured PollInterval timeout. If this behaviour is unacceptable, consider using some kind of an encapsulation method (JSON, XML, RFC5425, etc) or use and end marker with EndLine if possible.

In your case, if each multiline log ends with 2 newlines, you should try

EndLine /\r?\n\r?\n/

Hope this help.

Community
  • 1
  • 1
PatBriPerso
  • 101
  • 4