5

I try to count the access on a specific URL which begins every time with "shop/product?traffic=ads" with AWK, but I failed.

The following code gives me a counter how often an IP address has accessed these URL:

awk -F'[ "]+' '$7 == "/shop/product?traffic=ads" { ipcount[$1]++ }
END { for (i in ipcount) {
    printf "%15s - %d\n", i, ipcount[i] } }' /var/www/vhosts/domain.com/logs/access_ssl_log

An example for the access_log (input-file) is here:

66.249.68.xx- - [19/Dec/2022:09:14:15 +0100] "GET /shop/other-product/1.0" 404 16996 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.xxx Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
109.42.242.xxx - - [19/Dec/2022:09:14:55 +0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
80.187.75.xx - - [20/Dec/2022:06:40:12 +0100] "GET /shop/product HTTP/1.0" 200 10821 "https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1"

The "gclid" and and the "dt"(session cookie) are dynamic.

I try to play with ^ after ads, before /shop, but there will be no results.

I want for example the following output:

6 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
1 Clicks from 80.187.75.xx to https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791"
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
nils50122
  • 63
  • 5
  • please update the question with a few lines from the log ... some that match, some that don't match; also include the expected output (corresponding to the provided sample input) – markp-fuso Dec 23 '22 at 17:35
  • `... $7 ~ "^/shop/product?traffic=ads" ...` – jhnc Dec 24 '22 at 06:30
  • @jhnc: i try these already - wont work (no results). – nils50122 Dec 24 '22 at 09:28
  • @markp-fuso: question is updated. – nils50122 Dec 24 '22 at 09:38
  • 1
    Please [edit] your question to either fix your example so we can get the expected output you show from the sample input you show or explain that mapping given the example you currently have. For us to best be able to help you, we need you to provide a [mcve] with concise sample input/expected output we can copy/paste to test a potential solution with – Ed Morton Dec 24 '22 at 12:37
  • You also need to escape `?` : `... $7 ~ "^/shop/product[?]traffic=ads" ...` – jhnc Dec 24 '22 at 15:47
  • Your output does not match the example data provided. Also, the 80.187.75.xx output line seems to be comparing against `$10` not `$7` – jhnc Dec 24 '22 at 16:16

2 Answers2

5

You can check if the string occurs in field 7 using index(), and then store the values of field 1 and field 7 with a space in between as the key, to retrieve both values in the END block by splitting on a space again.

awk -F'[ "]+' 'index($7,  "/shop/product?traffic=ads") { ipcount[$1 " " $7]++ }

END { for (i in ipcount) {
    parts = split(i, a, " ")
    printf ipcount[i] " Clicks from " a[1] " to " a[2] "\n"
  }
}' file

Test data

66.249.68.xx- - [19/Dec/2022:09:14:15 +0100] "GET /shop/other-product/1.0" 404 16996 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.5304.xxx Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
109.42.242.xxx - - [19/Dec/2022:09:14:55 +0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
109.42.242.xxx - - [19/Dec/2022:09:15:55 +0100] "GET /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"
80.187.75.xx - - [20/Dec/2022:06:40:12 +0100] "GET /shop/product HTTP/1.0" 200 10821 "https://www.example.com/shop/product?traffic=ads&gclid=EAIaIQobChMIg_Ks5vWF_AIVAgGLCh3k_gBKEAAYASAAEgKBOfD_BwE&dt=1671461107791" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.0 Mobile/15E148 Safari/604.1"
109.42.242.xxx - - [19/Dec/2022:09:15:55 +0100] "GET /shop/product?traffic=ads&gclid=Aj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB HTTP/1.0" 200 30589 "https://www.google.com/" "Mozilla/5.0 (Linux; Android 11; SM-A515F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36"

Output

1 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Aj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
2 Clicks from 109.42.242.xxx to /shop/product?traffic=ads&gclid=Cj0KCQiAtICdBhCLARIsALUBFcFMmvFbA_1EyTTMRDp9IWhDXFA_HCeuEsIBXl886PoaAapen2KdussaAniSEALw_wcB
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Can we add the timestamps (TT/MM/YYYY + HH:MM:SS) when an user from same IP clicks more than one time to check it? Perfect will first the count of clicks, then the IP, then timestamps and then the URL. – nils50122 Dec 30 '22 at 15:34
5

With your shown samples please try following awk code. Using match function to match regex \/shop\/product\?traffic=ads\S+(where escaped / to match literal /) and if match is found then creating an array value with index of $1 FS and matched value. In the END block of this program printing the values as per requirement.

awk '
match($7,/\/shop\/product\?traffic=ads\S+/){
  value[$1 FS substr($7,RSTART,RLENGTH)]++
}
END{
  for(i in value){
    split(i,arr)
    print value[i] " Clicks from " arr[1]  " to " arr[2]
  }
}
'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Thanks, this code works. Can we add the timestamps (TT/MM/YYYY + HH:MM:SS) when an user from same IP clicks more than one time to check it? Perfect will first the count of clicks, then the IP, then timestamps and then the URL. After that i need to send it via mails. – nils50122 Jan 02 '23 at 15:40