40

I want to extract information from a log file using a shell script (bash) based on time range. A line in the log file looks like this:

172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"

i want to extract data specific intervals. For example I need to look only at the events which happened during the last X minutes or X days ago from the last recorded data. I'm new in shell scripting but i have tried to use grep command.

codeforester
  • 39,467
  • 16
  • 112
  • 140
ham raaz _e
  • 401
  • 1
  • 4
  • 3
  • are you familiar with awk/sed? – Foo Bah Sep 27 '11 at 20:34
  • This question needs to be narrowed in scope so that it is not so general purpose. If the question were to say "I want to gather all loglines from the current hour" then this question CAN be answered. Otherwise, it's NOT an answerable question because of all the edge cases, like gathering logs from 1 hour ago (what if it's 12:30AM?), plus the other raised issues, and the question REALLY asks "What software libraries can assist with processing and interpreting standard logfiles?". Because this problem is solved that way, and it's not simple regex. Also, yesterday was a leap day. :-) – Scott Prive Mar 01 '16 at 16:16

5 Answers5

62

You can use sed for this. For example:

$ sed -n '/Feb 23 13:55/,/Feb 23 14:00/p' /var/log/mail.log
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: connect from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: lost connection after CONNECT from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie postfix/smtpd[20964]: disconnect from localhost[127.0.0.1]
Feb 23 13:55:01 messagerie pop3d: Connection, ip=[::ffff:127.0.0.1]
...

How it works

The -n switch tells sed to not output each line of the file it reads (default behaviour).

The last p after the regular expressions tells it to print lines that match the preceding expression.

The expression '/pattern1/,/pattern2/' will print everything that is between first pattern and second pattern. In this case it will print every line it finds between the string Feb 23 13:55 and the string Feb 23 14:00.

More info here

ychaouche
  • 4,922
  • 2
  • 44
  • 52
  • 4
    This appears to be the best solution, lacks of explanation to understand what it does for non-sed addict, but this is great. – Adam Feb 23 '15 at 16:25
  • 6
    My comment was unaccepted (too long) so here's a [wiki page](http://ychaouche.informatick.net/logsearch) about how that particular command works, how sed works in general, and why you should learn sed if you only know python – ychaouche Feb 24 '15 at 08:24
  • 2
    NB Stops on the first line that satisfies the end clause, so if there are multiple 14:00 only the first would be returned. – Cliffordlife Sep 24 '15 at 15:11
  • 1
    Right. You can easily workaround this by choosing the next timestamp as the right part of the address range. – ychaouche Oct 13 '15 at 16:18
  • 4
    If you know what time stamps are in the log and it is densely populated, that's a viable approach. If you don't get entries every minute / hour / day your regex might not have matches on your start or end regex, and then you get nothing at all, or too much, respectively. – tripleee Feb 05 '16 at 07:50
  • The log doesn't have to be densely populated. What I do in general is to first skim through the log with `less` and try to spot the start and end date by hand, then use those values in the sed command. – ychaouche Oct 06 '16 at 10:43
  • Brilliant answer & very fast too. – Sumit Jain Mar 08 '17 at 10:10
  • This does not work! What if the string "Feb 23 13:55" is not in the log file, but the first entry is "Feb 23 13:56:01". This accounts for both "pattern1" and "pattern2" of the range `/pattern1/,/pattern2/` – kvantour Oct 10 '18 at 09:04
  • 1
    This may be the best answer **out of the ones given to this question**, but IMHO **none** of them do what the OP actually asked! They (and I) want to extract data based on a **variable** date/time, not a fixed one. I would even go so far as to say that the other answers probably come closer, although none of them worked for me (presumably because my logfile has a very different format). – Kenny83 Apr 06 '20 at 06:54
  • Is sed parsing the date pattern as a date to do comparison checks? How does it know what dates are between the given two? – MrChadMWood Jul 13 '23 at 22:59
  • @MrChadMWood No, it's treating them as BRE (Basic Regular Expressions), unless you use GNU sed which supports the -r flag for ERE (Extended Regular Expressions). – ychaouche Jul 16 '23 at 12:33
  • In that case, it would return records pertaining to `Feb 23 13:55` or `Feb 23 14:00`, not records between `Feb 23 13:55` and `Feb 23 14:00`. Is that right? e.g. `Feb 23 13:56` would not be included in results via this method? Thanks. – MrChadMWood Jul 18 '23 at 17:39
  • 1
    @MrChadMWood, sed has ranges. It will print whatever is between the occurrence of first pattern and first occurrence of second pattern, this includes lines starting with Feb 23 13:56, because log files are naturally sorted by date. – ychaouche Jul 23 '23 at 09:22
37

Use grep and regular expressions, for example if you want 4 minutes interval of logs:

grep "31/Mar/2002:19:3[1-5]" logfile

will return all logs lines between 19:31 and 19:35 on 31/Mar/2002. Supposing you need the last 5 days starting from today 27/Sep/2011 you may use the following:

grep "2[3-7]/Sep/2011" logfile
ztank1013
  • 6,939
  • 2
  • 22
  • 20
  • 2
    my opinion: grep is not the right tool to solve this. comparing date using regex could be very difficult. e.g I want all records 3 days 15 hours and 32 mins ago from the last record. also it could have month change (e.g. last/first day of a month), day change (1st/last hour of a day), even year change. it could be very complicated with regex, even if it is possible. – Kent Sep 27 '11 at 21:44
  • 3
    You may be right but mine was a quick and dirty solution to a complex problem, especially with that kind of date format. If you want to have much power filtering logs you should use different tools probably... – ztank1013 Sep 28 '11 at 07:12
8

well, I have spent some time on your date format.....

however, finally i worked it out..

let's take an example file (named logFile), i made it a bit short. say, you want to get last 5 mins' log in this file:

172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:20:41 +0200] "GET 
### lines below are what you want (5 mins till the last record)
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:27:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 
172.16.0.3 - - [31/Mar/2002:19:30:41 +0200] "GET 

here is the solution:

# this variable you could customize, important is convert to seconds. 
# e.g 5days=$((5*24*3600))
x=$((5*60))   #here we take 5 mins as example

# this line get the timestamp in seconds of last line of your logfile
last=$(tail -n1 logFile|awk -F'[][]' '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; print d;}' )

#this awk will give you lines you needs:
awk -F'[][]' -v last=$last -v x=$x '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; if (last-d<=x)print $0 }' logFile      

output:

172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:27:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET 
172.16.0.3 - -  31 Mar 2002 19:30:41 +0200  "GET

EDIT

you may notice that in the output the [ and ] are disappeared. If you do want them back, you can change the last awk line print $0 -> print $1 "[" $2 "]" $3

Kent
  • 189,393
  • 32
  • 233
  • 301
  • I dont actually understand the solution as i am new to sed and awk. My log file contains line as follows 2014-02-27 21:37:35 supervisor [INFO] Starting supervisor with id 3100de93-8c33-43a9-8e2f-2b8c3d926831 at host How do i extract lines from this log file which occurred in the last minute? Please help – Manikandan Kannan Feb 27 '14 at 22:41
4

I used this command to find last 5 minutes logs for particular event "DHCPACK", try below:

$ grep "DHCPACK" /var/log/messages | grep "$(date +%h\ %d) [$(date --date='5 min ago' %H)-$(date +%H)]:*:*"
fedorqui
  • 275,237
  • 103
  • 548
  • 598
sdeva
  • 41
  • 1
-1

You can use this for getting current and log times:

#!/bin/bash

log="log_file_name"
while read line
do
  current_hours=`date | awk 'BEGIN{FS="[ :]+"}; {print $4}'`
  current_minutes=`date | awk 'BEGIN{FS="[ :]+"}; {print $5}'`
  current_seconds=`date | awk 'BEGIN{FS="[ :]+"}; {print $6}'`

  log_file_hours=`echo $line | awk 'BEGIN{FS="[ [/:]+"}; {print  $7}'`
  log_file_minutes=`echo $line | awk 'BEGIN{FS="[ [/:]+"}; {print  $8}'`
  log_file_seconds=`echo $line | awk 'BEGIN{FS="[ [/:]+"}; {print  $9}'`    
done < $log

And compare log_file_* and current_* variables.

nick
  • 643
  • 1
  • 5
  • 11
  • This code has multiple problems. You should be using `read -r` and properly quote the values of `$line` and `$log` when you interpolate them. The huge number of external processes wasted on stuff which can be done with Bash internals rather more simply also speaks against this solution, although it is of course not technically wrong. – tripleee Nov 26 '15 at 06:36