-1

I am in the process of parsing log files I get from my Content Delivery Network. I have gotten to the point where I am able to isolate one part of the log file, which is what IP address accessed our website. What I want to achieve here is a top 10 or so list of IP addresses from a large list of every IP address. Some example data I get when I print the list looks like this:

192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.1
192.168.1.1
192.168.1.1

These are not the real IP's that I get from the output and there are many more. As you can see though, they are not grouped together. How would I do something like this?

Edit: Here is my code

import gzip
from collections import Counter
logFileName = open('C:\\Users\\Pawlaczykm\\Desktop\\fileNames.txt', 'r')
for line in logFileName.readlines():
    print 'Summary of: ' + line
    # use gzip to decompress the file
    with gzip.open('C:\\Users\\Pawlaczykm\\Desktop\\' + line.rstrip() + '.gz', 'rb') as f:
    for eachLine in f:
        parts = eachLine.split('\t')
        if len(parts) > 1:
            ipAdd = parts[2]
            c = Counter(ipAdd.splitlines())
            print(c.most_common(10))
mattp341
  • 241
  • 1
  • 3
  • 12

1 Answers1

1

You can use collections.Counter for this:

s = """192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.1
192.168.1.1
192.168.1.1"""

from collections import Counter
c = Counter(s.splitlines())

Now you can get the 10 most common addresses, i.e. the top-10-list:

print(c.most_common(10))

Output:

[('192.168.1.1', 8), ('192.168.1.2', 4)]

This is a list with the addresses an their counts.

In your case, you need to give the counter all the addresses:

addresses = []
for eachLine in f:
    parts = eachLine.split('\t')
    if len(parts) > 1:
        ipAdd = parts[2]
        addresses.append(ipAdd.strip())
c = Counter(addresses)
print(c.most_common(10))
Mike Müller
  • 82,630
  • 20
  • 166
  • 161