How to count duplicate log entries

Question

I want to analyse a log file and count recurring log entries.

I saw this answer, but every log entry is unique because of the time stamp.

If the log entry is of the format

Time stamp: [log message]

How do I remove the start of the line up to the [colon][space] so I can count them? I am guessing a sed command might do it?

[edit]

Sadly that was an over simplification on my part of the log. Example of log;

Jun 27 20:39:26 emonpi systemd[1]: Starting Clean php session files...
Jun 27 20:39:26 emonpi systemd[1]: Started Clean php session files.
Jun 27 21:09:25 emonpi systemd[1]: Starting Clean php session files...
Jun 27 21:09:26 emonpi systemd[1]: Started Clean php session files.

where the unique element to search on would be the first : (colon space).

I want this to count each of these messages in the log file.

What environment and programming language are you using? The first thing that comes to mind is [RegExp](https://en.wikipedia.org/wiki/Regular_expression). — Pyromonk, Jun 24 '19 at 11:14
Linux - command line. Yes I'm aware of RegExs but far from an expert in them. — Brian, Jun 24 '19 at 12:32

score 0 · Answer 1 · answered Jun 26 '19 at 03:52

0

Assuming every log message is within square brackets and the name of the file is "file.log", this will give you the desired output:

grep -E -o '\[.+?\].*' file.log | sort | uniq -c.

For future use cases, I recommend you familiarise yourself with regular expressions. They are very helpful.

answered Jun 26 '19 at 03:52

Pyromonk

684
1
12
27

I have amended the question with a better example of the log file (just a standard Linux log format). I do understand RegEx (not an expert) but do not know how to feed the recurring bit (after the first colon space delimiter) into a sort & count. – Brian Jun 27 '19 at 21:50

score 0 · Answer 2 · answered Jun 27 '19 at 22:15

0

Thanks to @pyromonk who pointed me in the right direction.

grep -E -o  ':\s.*' syslog | sort -nr | uniq -cd

did what I needed. As not all messages are formatted the same I needed to run this with several different regex.

What I learnt is that the -o part of the grep sends on the matched part of the line to the sort and count. The output did not come out as an ordered list but did count the unique log messages.

answered Jun 27 '19 at 22:15

Brian

349
3
17

I am glad I was able to be of some help. I am surprised the output didn't come out sorted... It did for me. Using `sort -u` might help? – Pyromonk Jun 28 '19 at 00:11

How to count duplicate log entries

2 Answers2