4

I have a log file, containing text like:

66.249.74.18 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.jpg HTTP/1.1" 7691 "-" "Googlebot-Image/1.0" "-"
220.181.108.96 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.html HTTP/1.1" 17722 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "-"

I want to collect all the ip and user agent info to a file:

66.249.74.18 "Googlebot-Image/1.0"
220.181.108.96 "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

How can I do it with awk?

I know awk '{print $1}' can list all ips and awk -F\" '{print $6}' can list all User Agent, but I have no idea how to combine them into output.

Zombo
  • 1
  • 62
  • 391
  • 407
yang
  • 508
  • 7
  • 17

4 Answers4

3
awk -F' - |\\"' '{print $1, $7}' temp1

output:

66.249.74.18 Googlebot-Image/1.0
220.181.108.96 Mozilla/5.0 (compatible;Baiduspider/2.0;+http://www.baidu.com/search/spider.html)

temp1 file:

66.249.74.18 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.jpg HTTP/1.1" 7691 "-" "Googlebot-Image/1.0" "-"
220.181.108.96 - - [21/Apr/2013:05:55:33 +0000] 200 "GET /1.html HTTP/1.1" 17722 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"     "-"
Scy
  • 488
  • 3
  • 11
2
awk '{print $1,$6}' FPAT='(^| )[0-9.]+|"[^"]*"'
  • define a field as
    • start with beginning of line or space
    • followed by [0-9.]+ or "[^"]*"
  • then print fields 1 and 6
Zombo
  • 1
  • 62
  • 391
  • 407
  • 1
    Is there a way to add quote marks to the ip fields first? Then I could use awk -F\" '{print $2 $8}' to get the right result. – yang Apr 21 '13 at 08:21
2

A portable approach not using GNU extensions:

awk '{printf "%s ",$1;for(i=12;i<NF;i++)printf "%s ",$i;printf "\n"}' file
Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
  • it returns: "66.249.74.18" "GET /1.jpg HTTP/1.1" 7691 "-" "Googlebot-Image/1.0" "220.181.108.96" "GET /1.html HTTP/1.1" 17722 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" but awk '{printf "%s ",$1;for(i=7;i – yang Apr 22 '13 at 05:00
  • @user432506 yes, you want from `i=12` see update. If this solves your problem please upvote and accept this answer, to accept the accept the answer you click the tick mark next to the question, this will show the question as solved. – Chris Seymour Apr 22 '13 at 08:45
  • how to select particular date data from file using awk command @iiSeymour – saikiran Nov 27 '14 at 10:29
1

Using perl:

perl -nle '/^((?:\d+\.?){4})(?:.+?"){4}\s+(".*?")/ && print "$1 $2"' access_log

The trick lies on counting chars that are not double quote + double quote: (?:.+?"){4}. Here's a visual description of the regexp: https://regex101.com/r/xP0kF4/4

The regexp is more complex than previous answers but we could easily parse other properties.

luissquall
  • 1,740
  • 19
  • 14