3

I was searching for parsing a log file and found what I need in this link extract data from log file in specified range of time

But the most useful answer (posted by @Kent):

# this variable you could customize, important is convert to seconds. 
# e.g 5days=$((5*24*3600))
x=$((5*60))   #here we take 5 mins as example

# this line get the timestamp in seconds of last line of your logfile
last=$(tail -n1 logFile|awk -F'[][]' '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; print d;}' )

#this awk will give you lines you needs:
awk -F'[][]' -v last=$last -v x=$x '{ gsub(/\//," ",$2); sub(/:/," ",$2); "date +%s -d \""$2"\""|getline d; if (last-d<=x)print $0 }' logFile 

I think the error is in the "date +%s -d .... part

is giving the following error:

sh: -c: line 0: unexpected EOF while looking for matching `"'
sh: -c: line 1: syntax error: unexpected end of file
sh: -c: line 0: unexpected EOF while looking for matching `"'
sh: -c: line 1: syntax error: unexpected end of file

I spend lot of time trying to solve before I ask here but didn't find any solution.

The script will be called by the crontab to get the last 1 min log lines and count how many times an ip is listed in one minute so I can detect if it is an attack or not. and this is another task hope that an expert will help giving the needed code here in the same question.(I think it can be solved in 2 lines).

Community
  • 1
  • 1
kingk110
  • 33
  • 1
  • 6
  • Why reinvent the wheel? There are tools which do things like that already. – Adrian Frühwirth Sep 06 '13 at 19:12
  • Those error messages seem to indicate that you left off a closing double quote on a string somewhere. I would guess that it might be on line 132, in the 37th position, right before the third `if` statement, but that would totally be a guess, because you haven't posted the actual code.... – twalberg Sep 06 '13 at 19:58
  • The code is posted in the link and I mentioned that the most useful answer was in that question. I'll edit the question anyway. and this is the image clarifying the .sh file, the logFile and the result obtained in the terminal http://postimg.org/image/lih0v0gzx/ – kingk110 Sep 07 '13 at 10:10

2 Answers2

2

The problem is probably just that you're not quoting your shell variables. Look:

$ foo='ab cd'

$ awk -v bar="$foo" 'BEGIN{print bar}'
ab cd

$ awk -v bar=$foo 'BEGIN{print bar}'
awk: fatal: cannot open file `BEGIN{print bar}' for reading (No such file or directory)

Yes, I know that's a different error message - what happens when you leave shell variables unquoted can by any number of things depending on the value of the variable, contents of your directory, etc., some of them VERY bad like removing every file in your filesystem.

So, quote your variables:

-v last="$last" -v x="$x"

then see if you still have the problem.

By the way, here's how to really solve your original problem using GNU awk with the input file http://pastebin.com/BXmS4zLn:

$ cat tst.awk
BEGIN {
    ARGV[ARGC++] = ARGV[ARGC-1]

    mths = "JanFebMarAprMayJunJulAugSepOctNovDec"

    if (days)  { hours = days * 24  }
    if (hours) { mins  = hours * 60 }
    if (mins)  { secs  = mins * 60  }
    deltaSecs = secs
}

NR==FNR {
    nr2secs[NR] = mktime($6" "(match(mths,$5)+2)/3" "$4" "gensub(/:/," ","g",$7))
    next
}

nr2secs[FNR] >= (nr2secs[NR-FNR] - deltaSecs)

$ awk -v hours=1 -f tst.awk file
157.55.34.99 - -  06 Sep 2013 09:13:10 +0300  "GET /index.php HTTP/1.1" 200 16977 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
85.163.134.149 - -  06 Sep 2013 09:50:23 +0300  "GET /wap/wapicons/mnrwap.jpg HTTP/1.1" 200 1217 "http://mydomain.com/main.php" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-gb; GT-I9082 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

$ gawk -v mins=60 -f tst.awk file
157.55.34.99 - -  06 Sep 2013 09:13:10 +0300  "GET /index.php HTTP/1.1" 200 16977 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
85.163.134.149 - -  06 Sep 2013 09:50:23 +0300  "GET /wap/wapicons/mnrwap.jpg HTTP/1.1" 200 1217 "http://mydomain.com/main.php" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-gb; GT-I9082 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

$ gawk -v mins=20 -f tst.awk file
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"

You can specify days= or hours= or mins= or secs= variables and it'll do the right thing.

If you only need a script to get the last 1 mins worth of log lines as your question states (now?), and want to see a one-liner to do it:

$ gawk 'NR==FNR {nr2secs[++nr] = mktime($6" "(match("JanFebMarAprMayJunJulAugSepOctNovDec",$5)+2)/3" "$4" "gensub(/:/," ","g",$7)); next} nr2secs[FNR] >= (nr2secs[nr] - 60)' file file
83.113.48.218 - -  06 Sep 2013 10:13:07 +0300  "GET /english/nicons/word.gif HTTP/1.1" 200 803 "http://mydomain.com/french/details.php?eid=127928&cid=18&fromval=1&frid=18" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)"
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Basing from your input here, you could use a script like this:

#!/bin/bash

LOGFILE=/path/to/logfile

X=$(( 60 * 60 )) ## 1 Hour

function get_ts {
    DATE="${1%%\]*}"; DATE="${DATE##*\[}"; DATE=${DATE/:/ }; DATE=${DATE//\// }
    TS=$(date -d "$DATE" '+%s')
}

get_ts "$(tail -n 1 "$LOGFILE")"
LAST=$TS

while read -r LINE; do
    get_ts "$LINE"
    (( (LAST - TS) <= X )) && echo "$LINE"
done < "$LOGFILE"

Save it to a file and change the value for LOGFILE, then run with bash script.sh.

Example output:

157.55.34.99 - - [06/Sep/2013:09:13:10 +0300] "GET /index.php HTTP/1.1" 200 16977 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
85.163.134.149 - - [06/Sep/2013:09:50:23 +0300] "GET /wap/wapicons/mnrwap.jpg HTTP/1.1" 200 1217 "http://mydomain.com/main.php" "Mozilla/5.0 (Linux; U; Android 4.1.2; en-gb; GT-I9082 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30"
konsolebox
  • 72,135
  • 12
  • 99
  • 105
  • Same result is shown. I know the crontab features. I meant the command that will count each IP and give the number of calling urls followed by IP. I may found that command if I search it is not the main issue. – kingk110 Sep 06 '13 at 18:39
  • @kingk110 Mind showing us some crucial parts of the codes you used? Especially the one that is called through `sh -c`. – konsolebox Sep 06 '13 at 19:01
  • The code is posted in the link and I mentioned that the most useful answer was in that question. I'll edit the question anyway. – kingk110 Sep 07 '13 at 07:52
  • @king110 That's actually the limitations of awk. When you call an external command from it, it depends on the shell, passing it to be re-evaluated. If your input somehow contains characters that may alter the syntax it would cause syntax error on it. Make sure the expected command string that would be evaluated by the shell is syntactically correct, whether an input gives you dangerous characters like `"` or not. Also I know that it's what you're basing from but I'm expecting that that's not exactly the code you're running, or at least the input is not the same. Please show it to us. – konsolebox Sep 07 '13 at 09:05
  • Here is the .sh file and the logFile and the result as it was executed http://postimg.org/image/lih0v0gzx/ – kingk110 Sep 07 '13 at 10:07
  • @kingk110 Are there no extra strings after `"GET `? That unpaired `"` certainly would cause syntax error to the shell awk is calling. – konsolebox Sep 07 '13 at 10:13
  • @kingk110 It seems that somehow there are instances where awk gets a string with `"` instead. Can you update your post with a sample of those files so that we can examine it ourselves? – konsolebox Sep 07 '13 at 11:04
  • here is a sample of log file with different times that I copied from it http://pastebin.com/BXmS4zLn – kingk110 Sep 07 '13 at 11:45
  • @kingk110 I made a script that bases mostly on bash instead. I just can't recommend using awk to process inputs or varying data on a subshell sorry. – konsolebox Sep 07 '13 at 12:24
  • I am so sorry but it seems that I post the wrong link that the [] are deleted and the date format differs http://pastebin.com/i2ewC7i0 – kingk110 Sep 07 '13 at 14:05