1

I want to know how many user have visited google.com using my proxy with last 30 minutes.

 awk -v bt=$(date "+%s" -d "30 minutes ago") '($1 > bt) && $4~/google.com/ {printf("%s|%s|%s|%s\n", strftime("%F %T",$1), $2 , $3, $4)} ' access.log

The logs look like this

2017-02-19 12:09:44|test@gmail.com|200|https://google.com/
2017-02-19 12:10:23|test@gmail.com|200|https://google.com/

Now i can easily count the number of records

 awk -v bt=$(date "+%s" -d "30 minutes ago") '($1 > bt) && $4~/google.com/ {printf("%s|%s|%s|%s\n", strftime("%F %T",$1), $2 , $3, $4)} ' access.log | wc -l

Output is 2.

How can i modify the command to display only records with unique email.In the above case the output should be 1.

user2650277
  • 6,289
  • 17
  • 63
  • 132

3 Answers3

1

To list result

awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++
  ' access.log

To get count

awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++{ count++ }
    END{ print count+0 }
  ' access.log

For Testing

# Current datetime of my system
$ date +'%Y-%m-%d %H:%M:%S'
2017-02-26 00:06:19

# 30 minutes ago what was datetime
$ date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago'
2017-02-25 23:36:20

# Input file, I modified datetime to check command
$ cat f
2017-02-25 23:10:44|test@gmail.com|200|https://google.com/
2017-02-25 23:45:23|test@gmail.com|200|https://google.com/

Output - 1 to see result

$ awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++
  ' f
2017-02-25 23:45:23|test@gmail.com|200|https://google.com/

Output - 2 to see count

$ awk -v FS='|' -v bt="$(date +'%Y-%m-%d %H:%M:%S' -d '30 minutes ago')" '
    ($1 > bt) && $4~/google.com/  && !seen[$2]++{ count++ }
    END{ print count+0 }
  ' f
1
Akshay Hegde
  • 16,536
  • 2
  • 22
  • 36
0

You can use sort to select unique email account.

And you can refer to is-there-a-way-to-uniq-by-column

Community
  • 1
  • 1
慕冬亮
  • 339
  • 1
  • 2
  • 10
0

Simply pipe the logs to

sort -u -t "|" -k "2"

So you will have something like:

awk -v bt=$(date "+%s" -d "30 minutes ago") '($1 > bt) && $4~/google.com/ {printf("%s|%s|%s|%s\n", strftime("%F %T",$1), $2 , $3, $4)} ' access.log | sort -u -t "|" -k "2"
farghal
  • 281
  • 1
  • 5