I am in the process of parsing log files I get from my Content Delivery Network. I have gotten to the point where I am able to isolate one part of the log file, which is what IP address accessed our website. What I want to achieve here is a top 10 or so list of IP addresses from a large list of every IP address. Some example data I get when I print the list looks like this:
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.1
192.168.1.1
192.168.1.1
These are not the real IP's that I get from the output and there are many more. As you can see though, they are not grouped together. How would I do something like this?
Edit: Here is my code
import gzip
from collections import Counter
logFileName = open('C:\\Users\\Pawlaczykm\\Desktop\\fileNames.txt', 'r')
for line in logFileName.readlines():
print 'Summary of: ' + line
# use gzip to decompress the file
with gzip.open('C:\\Users\\Pawlaczykm\\Desktop\\' + line.rstrip() + '.gz', 'rb') as f:
for eachLine in f:
parts = eachLine.split('\t')
if len(parts) > 1:
ipAdd = parts[2]
c = Counter(ipAdd.splitlines())
print(c.most_common(10))