-1

I am using a cPanel account and have an Apache 2.4 access log that stores its logs like:

66.249.93.30 - - [04/May/2018:21:26:39 +0200] "GET / HTTP/1.1" 302 207 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/41.0.2272.118 Safari/537.36"
66.249.93.30 - - [05/May/2018:10:26:39 +0200] "GET / HTTP/1.1" 302 207 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/41.0.2272.118 Safari/537.36"

The date is in format date "+%d/%B/%Y:%k:%M:%S"

Using a bash script I would like to extract just the lines that were logged in the last hour, for example:

Full Log file:

66.249.93.30 - - [04/May/2018:21:26:39 +0200] First Line
66.249.93.30 - - [05/May/2018:11:00:21 +0200] Second Line
66.249.93.30 - - [05/May/2018:11:15:39 +0200] Third Line
66.249.93.30 - - [05/May/2018:12:00:11 +0200] Fourth Line

Current Time: 05/May/2018:12:01:06

Logs from: 5th of May between the time interval of 11:01 - 12:01

Filtered Output:

66.249.93.30 - - [05/May/2018:11:15:39 +0200] Third Line
66.249.93.30 - - [05/May/2018:12:00:11 +0200] Fourth Line

I have tried using awk and several other suggestions but I can't get it to work, any help will be appreciated!

Ivan Denchev
  • 426
  • 2
  • 11
  • See: [extract last 10 minutes from logfile](https://stackoverflow.com/q/20649387/3776858) – Cyrus May 05 '18 at 10:22
  • Hey Cyrus, thank you for the link. I've went through it around one hour ago and I'm still trying to adjust the comparison, hopefully I will get it to work :) – Ivan Denchev May 05 '18 at 10:51

2 Answers2

0
$ date
Sat, May 05, 2018 10:49:13 AM

$ cat tst.awk
{
    split($4,t,/[[ :\/]/)
    mthNr = sprintf("%02d",(index("JanFebMarAprMayJunJulAugSepOctNovDec",t[3])+2)/3)
    curTime = t[4] mthNr t[2] t[5] t[6] t[7]
}
curTime >= minTime

$ awk -v minTime=$(date -d '60 min ago' '+%Y%m%d%H%M%S') -f tst.awk file
66.249.93.30 - - [05/May/2018:11:00:21 +0200] Second Line
66.249.93.30 - - [05/May/2018:11:15:39 +0200] Third Line
66.249.93.30 - - [05/May/2018:12:00:11 +0200] Fourth Line

Using the time from your question to get the expected output in your question:

$ awk -v minTime=$(date -d '2018/05/05 11:01:06' '+%Y%m%d%H%M%S') -f tst.awk file
66.249.93.30 - - [05/May/2018:11:15:39 +0200] Third Line
66.249.93.30 - - [05/May/2018:12:00:11 +0200] Fourth Line
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    Thanks Ed, I've went through your suggestion and I'll try to implement it tomorrow. I will also read the link that you provided - it does take around 3-4 seconds to handle around 200 - 300 lines of access logs. I appreciate your time - thanks again! – Ivan Denchev May 05 '18 at 20:11
-1

I was able to figure it out!

I had to convert the 04/May/2018:21:26:39 to a UNIX Timestamp. This is done by the following usage of date

date -d "YEAR-MONTH-DAY HR:M:S" "+%S"

Then make another UNIX Timestamp that's 60 minutes behind

date -d "60 min ago" "+%s"

And in an if conditional filter all log entries whose UNIX Timestamp is bigger ( -gt ) the 60 minutes behind Timestamp

With my current setup:

cPanel + Apache 2.4

Logging Format: /home/$USER/public_html_cron_logs/$DAY/$HOUR-$MINUTE-[GET|POST].log

Like /home/$USER/public_html_cron_logs/05-05-2018/14-53-GET.log

#!/bin/bash

LOG_DIR="public_html_cron_logs"

DAY=`date +"%d-%m-%Y"`
HOUR=`date "+%H-%M"`
GET_LOG="GET.log"
POST_LOG="POST.log"

if [ ! -d /home/$USER/$LOG_DIR/$DAY ]; then
    mkdir /home/$USER/$LOG_DIR/$DAY;
fi

CREATE_DIR=/home/$USER/$LOG_DIR/$DAY
GET_LOG=$CREATE_DIR/$HOUR-$GET_LOG
POST_LOG=$CREATE_DIR/$HOUR-$POST_LOG

while read line; do

    DATE_LOG=`echo $line | awk '{print $4}'`; DATE_LOG=${DATE_LOG:1}
    MONTH_VERB=`echo $DATE_LOG | awk -F '[/:]' '{print $2}'`

    if [ "$MONTH_VERB" = "January" ]; then
        MONTH=01
    elif [ "$MONTH_VERB" = "February" ]; then
        MONTH=02
    elif [ "$MONTH_VERB" = "March" ]; then
        MONTH=03
    elif [ "$MONTH_VERB" = "April" ]; then
        MONTH=04
    elif [ "$MONTH_VERB" = "May" ]; then
        MONTH=05
    elif [ "$MONTH_VERB" = "June" ]; then
        MONTH=06
    elif [ "$MONTH_VERB" = "July" ]; then
        MONTH=07
    elif [ "$MONTH_VERB" = "August" ]; then
        MONTH=08
    elif [ "$MONTH_VERB" = "September" ]; then
        MONTH=09
    elif [ "$MONTH_VERB" = "October" ]; then
        MONTH=10
    elif [ "$MONTH_VERB" = "November" ]; then
        MONTH=11
    elif [ "$MONTH_VERB" = "December" ]; then
        MONTH=12
    fi

    UNIX_DATE=`echo $DATE_LOG | awk -v AWK_MONTH="$MONTH" -F '[/:]' '{print $3"-"AWK_MONTH"-"$1" "$4":"$5":"$6}'`
    UNIX_TIMESTAMP_LOG=`date -d "$UNIX_DATE" "+%s"`
    UNIX_TIMESTAMP_LAST_HOUR=`date -d '60 min ago' "+%s"`

    if  [ $UNIX_TIMESTAMP_LOG -gt $UNIX_TIMESTAMP_LAST_HOUR ]; then
        if [[ $line = *"GET"* ]]; then
            echo $line | awk '{print $1}' >> $GET_LOG
        else
            echo $line | awk '{print $1}' >> $POST_LOG
        fi
    fi

done < ~/access-logs/ENTER_YOUR_DOMAIN_LOG_FILE_HERE
Ivan Denchev
  • 426
  • 2
  • 11
  • That's the wrong approach, see http://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why. – Ed Morton May 05 '18 at 15:29