I would like to extract from a list of log files (named access.log.*
) that look like this
95.11.113.x - [15/Nov/2013:18:25:17 +0100] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
95.11.113.x - [15/Nov/2013:18:25:19 +0100] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
95.11.113.x - [15/Nov/2013:18:25:21 +0100] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
125.111.9.x - [15/Nov/2013:20:00:00 +0100] "GET /files/azeazzae.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
132.41.100.x - [16/Nov/2013:11:15:11 +0100] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
132.41.100.x - [16/Nov/2013:11:15:11 +0100] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
132.41.100.x - [16/Nov/2013:11:15:11 +0100] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
a list of unique visitors (only one repetition per day) who visited /files/myfile.rar
, i.e. :
95.11.113.x - [15/Nov/2013] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
132.41.100.x - [16/Nov/2013] "GET /files/myfile.rar HTTP/1.1" 200 2437305154 blah.com "http://www.blah.com/files/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)" "-"
I tried to open files and look for the desired string /files/myfile.rar
like this : Search for string in txt file Python, but I didn't achieve to test for "same IP adress" and repetitions.
What should I use to do this? Standard string search, one line after another (Search for string in txt file Python) ? Regexp?
PS : even better for future use (sorting per date, etc.) :
2013-11-15 - 95.11.113.x - "GET /files/myfile.rar HTTP/1.1"
2013-11-16 - 132.41.100.x - "GET /files/myfile.rar HTTP/1.1"
2013-11-17 ....