18

I want to get the "GET" queries from my server logs.

For example, this is the server log

1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] code 404, message File not fo$
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 -   
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] code 404, message File not fo$
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 -

When I try with simple grep or awk,

Adi:~ adi$ awk '/GET/, /HTTP/' serverlogs.txt

it gives out

1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 -
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 -

I just want to display : hello and ss

Is there any way this could be done?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
aditya.gupta
  • 585
  • 3
  • 7
  • 15

6 Answers6

22

Assuming you have gnu grep, you can use perl-style regex to do a positive lookbehind:

grep -oP '(?<=GET\s/)\w+' file

If you don't have gnu grep, then I'd advise just using sed:

sed -n '/^.*GET[[:space:]]\{1,\}\/\([-_[:alnum:]]\{1,\}\).*$/s//\1/p' file

If you happen to have gnu sed, that can be greatly simplified:

sed -n '/^.*GET\s\+\/\(\w\+\).*$/s//\1/p' file

The bottom line here is, you certainly don't need pipes to accomplish this. grep or sed alone will suffice.

Tim Pote
  • 27,191
  • 6
  • 63
  • 65
  • Sed works like `/s/one/two/g` having 4 slashes. With `/^.*GET\s\+\/\(\w\+\).*$/s//\1/p` you have 5 slashes and `s`between. What is `s` here? – Timo May 28 '20 at 05:23
  • 1
    @Timo `What is s here?` `s` there is a sed substitution command. The `/regex/` is a matching command - it executes the command after `/regex/` only if the line matches. So `s` command is executed only if `/^.*GET\s\+\/\(\w\+\).*$/` mathces. Then empty regex reuses last regex, so `s//` is equal to `s/^.*GET\s\+\/\(\w\+\).*$/`. It get's replaced by `\1`, so by first matching group, ie. `\(\w\+\)`. The `p` flag for `s` command causes the replacement to be printed if the `s` command was successful. For more information, see https://www.grymoire.com/Unix/Sed.html and posix sed manual. – KamilCuk Aug 24 '20 at 21:18
12

In this case since the log file has a known structure, one option is to use cut to pull out the 7th column (fields are denoted by tabs by default).

grep GET log.txt | cut -f 7 
John Carter
  • 53,924
  • 26
  • 111
  • 144
5

I was trying to do this and came across this link: https://www.unix.com/shell-programming-and-scripting/153101-print-next-word-after-found-pattern.html

Summary: use grep to find matching lines, then use awk to find the pattern and print the next field:

grep pattern logfile | \
  awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}'

If you want to know the unique occurrences:

grep pattern logfile | \
  awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}' | \
  sort | \
  uniq -c
ajp619
  • 670
  • 7
  • 11
4

use a pipe if you use grep:

grep -o /he.* log.txt | grep -o [^/].*
grep -o /ss log.txt | grep -o [^/].*

[^/] means extract the letters after ^ symbol from the grep output

Charles Chow
  • 1,027
  • 12
  • 26
3

It's often easier to use a pipeline rather than a single complex regular expression. This works on the data you provided:

fgrep GET /tmp/foo | 
    egrep -o 'GET (.*) HTTP' |
    sed -r 's/^GET \/(.+) HTTP/\1/'

This pipeline returns the following results:

hello
ss

There are certainly other ways to get the job done, but this patently works on the provided corpus.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
1
gawk '{match($7,/\/(\w+)/,a);} length(a[1]){print a[1]}' log.txt
hello
ss

If you have gawk then above command will use match function to select the desired value using regex and storing it to an array a.

P....
  • 17,421
  • 2
  • 32
  • 52