1

I am trying to write a script which will find the unique lines(first occurance) based on columns/delimiters. In this case to my understanding delimiter is ":".

for example:

May 14 00:00:01  SERVER1 ntp[1006]:  ntpd[Info]: 1430748797.780852: ndtpq.c(20544): this is the log  
May 14 00:00:01  SERVER1 ntp[1006]:  ntpd[Info]: 1430748797.780853: ndtpq.c(20544): this is another log  
May 14 00:00:02  SERVER1 ntp[1006]:  ntpd[Info]: 1430748798.780852: ndtpq.c(20544): this is another log  
May 14 00:00:03  SERVER1 ntp[1006]:  ntpd[Info]: 1430748799.780852: ndtpq.c(20544): this is the log  
May 14 00:00:04  SERVER1 ntp[1006]:  ntpd[Info]: 1430748800.780852: ndtpq.c(20544): this is the log  
May 14 00:00:04  SERVER1 ntp[1006]:  ntpd[Info]: 1430748800.790852: ndtpq.c(20544): this is the log  
May 14 00:00:05  SERVER1 ntp[1006]:  ntpd[Info]: 1430748801.790852: ndtpq.c(20544): thisis really different log  

desired output:

May 14 00:00:01  SERVER1 ntp[1006]:  ntpd[Info]: 1430748797.780852: ndtpq.c(20544): this is the log  
May 14 00:00:01  SERVER1 ntp[1006]:  ntpd[Info]: 1430748797.780853: ndtpq.c(20544): this is another log  
May 14 00:00:05  SERVER1 ntp[1006]:  ntpd[Info]: 1430748801.790852: ndtpq.c(20544): thisis really different log  

I am able to find the uniq log using the following command but ,I am loosing the timestamp by using this way.

cat fileName |awk -F: '{print $7}'
  • What is the criteria for determining entries that should be grouped? Is it the content of the message? If so, what if the same message is generated later on? – Tom Fenech May 14 '15 at 08:52
  • its the content of the message and it can be ignored if seen later. Only first occurance is needed. –  May 14 '15 at 08:59

2 Answers2

2

This may do:

awk -F: '!seen[$NF]++' file
May 14 00:00:01  SERVER1 ntp[1006]:  ntpd[Info]: 1430748797.780852: ndtpq.c(20544): this is the log
May 14 00:00:01  SERVER1 ntp[1006]:  ntpd[Info]: 1430748797.780853: ndtpq.c(20544): this is another log
May 14 00:00:05  SERVER1 ntp[1006]:  ntpd[Info]: 1430748801.790852: ndtpq.c(20544): thisis really different log

It splits the file using :, then looks at the last field, and prints only the unique.

Jotne
  • 40,548
  • 12
  • 51
  • 55
1

Try this

Awk

 awk -F: '!x[$NF]++' infile

GNU Sort if order doesn't matter

 sort -u -t: -k7 infile
Akshay Hegde
  • 16,536
  • 2
  • 22
  • 36