Using grep to get the next WORD after a match in each line

Question

I want to get the "GET" queries from my server logs.

For example, this is the server log

1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] code 404, message File not fo$
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 -   
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] code 404, message File not fo$
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 -

When I try with simple grep or awk,

Adi:~ adi$ awk '/GET/, /HTTP/' serverlogs.txt

it gives out

1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 -
1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 -

I just want to display : hello and ss

Is there any way this could be done?

Tim Pote · Accepted Answer · 2018-08-29T14:42:17.743

22

Assuming you have gnu grep, you can use perl-style regex to do a positive lookbehind:

grep -oP '(?<=GET\s/)\w+' file

If you don't have gnu grep, then I'd advise just using sed:

sed -n '/^.*GET[[:space:]]\{1,\}\/\([-_[:alnum:]]\{1,\}\).*$/s//\1/p' file

If you happen to have gnu sed, that can be greatly simplified:

sed -n '/^.*GET\s\+\/\(\w\+\).*$/s//\1/p' file

The bottom line here is, you certainly don't need pipes to accomplish this. grep or sed alone will suffice.

edited Aug 29 '18 at 14:42

answered Jun 10 '12 at 19:58

Tim Pote

27,191
6
63
65

Sed works like `/s/one/two/g` having 4 slashes. With `/^.*GET\s\+\/$\w\+$.*$/s//\1/p` you have 5 slashes and `s`between. What is `s` here? – Timo May 28 '20 at 05:23
1

@Timo `What is s here?` `s` there is a sed substitution command. The `/regex/` is a matching command - it executes the command after `/regex/` only if the line matches. So `s` command is executed only if `/^.*GET\s\+\/$\w\+$.*$/` mathces. Then empty regex reuses last regex, so `s//` is equal to `s/^.*GET\s\+\/$\w\+$.*$/`. It get's replaced by `\1`, so by first matching group, ie. `$\w\+$`. The `p` flag for `s` command causes the replacement to be printed if the `s` command was successful. For more information, see https://www.grymoire.com/Unix/Sed.html and posix sed manual. – KamilCuk Aug 24 '20 at 21:18

score 12 · Answer 2 · answered Jun 10 '12 at 19:43

12

In this case since the log file has a known structure, one option is to use cut to pull out the 7th column (fields are denoted by tabs by default).

grep GET log.txt | cut -f 7

answered Jun 10 '12 at 19:43

John Carter

53,924
26
111
144

Still showing out the entire line. 1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:32:27] "GET /hello HTTP/1.1" 404 - 1.0.0.127.in-addr.arpa - - [10/Jun/2012 15:41:57] "GET /ss HTTP/1.1" 404 - – aditya.gupta Jun 10 '12 at 19:45
2

Hmmm, is it space or tab separated? If space, use `-d' '` with cut to specify space as the column delimiter. – John Carter Jun 10 '12 at 19:47
2

Works great with the **-d ' '** parameter. – aditya.gupta Jun 10 '12 at 20:08

score 5 · Answer 3 · answered Feb 20 '18 at 19:05

I was trying to do this and came across this link: https://www.unix.com/shell-programming-and-scripting/153101-print-next-word-after-found-pattern.html

Summary: use grep to find matching lines, then use awk to find the pattern and print the next field:

grep pattern logfile | \
  awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}'

If you want to know the unique occurrences:

grep pattern logfile | \
  awk '{for(i=1; i<=NF; i++) if($i~/pattern/) print $(i+1)}' | \
  sort | \
  uniq -c

score 4 · Answer 4 · answered Mar 07 '14 at 04:06

4

use a pipe if you use grep:

grep -o /he.* log.txt | grep -o [^/].*
grep -o /ss log.txt | grep -o [^/].*

[^/] means extract the letters after ^ symbol from the grep output

answered Mar 07 '14 at 04:06

Charles Chow

1,027
12
26

score 3 · Answer 5 · answered Jun 10 '12 at 19:51

It's often easier to use a pipeline rather than a single complex regular expression. This works on the data you provided:

fgrep GET /tmp/foo | 
    egrep -o 'GET (.*) HTTP' |
    sed -r 's/^GET \/(.+) HTTP/\1/'

This pipeline returns the following results:

hello
ss

There are certainly other ways to get the job done, but this patently works on the provided corpus.

score 1 · Answer 6 · answered Mar 28 '17 at 09:59

1

gawk '{match($7,/\/(\w+)/,a);} length(a[1]){print a[1]}' log.txt
hello
ss

If you have gawk then above command will use match function to select the desired value using regex and storing it to an array a.

answered Mar 28 '17 at 09:59

P....

17,421
2
32
52

Using grep to get the next WORD after a match in each line

6 Answers6

Linked

Related