Grouping together "like values" from a list

Question

I am in the process of parsing log files I get from my Content Delivery Network. I have gotten to the point where I am able to isolate one part of the log file, which is what IP address accessed our website. What I want to achieve here is a top 10 or so list of IP addresses from a large list of every IP address. Some example data I get when I print the list looks like this:

192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.1
192.168.1.1
192.168.1.1

These are not the real IP's that I get from the output and there are many more. As you can see though, they are not grouped together. How would I do something like this?

Edit: Here is my code

import gzip
from collections import Counter
logFileName = open('C:\\Users\\Pawlaczykm\\Desktop\\fileNames.txt', 'r')
for line in logFileName.readlines():
    print 'Summary of: ' + line
    # use gzip to decompress the file
    with gzip.open('C:\\Users\\Pawlaczykm\\Desktop\\' + line.rstrip() + '.gz', 'rb') as f:
    for eachLine in f:
        parts = eachLine.split('\t')
        if len(parts) > 1:
            ipAdd = parts[2]
            c = Counter(ipAdd.splitlines())
            print(c.most_common(10))

`sort` the list, and use `itertools.groupby` if you actually want them grouped together — Chris_Rands, Feb 14 '17 at 16:37

Mike Müller · Accepted Answer · 2017-02-14T17:30:06.480

1

You can use collections.Counter for this:

s = """192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.1
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.2
192.168.1.1
192.168.1.1
192.168.1.1"""

from collections import Counter
c = Counter(s.splitlines())

Now you can get the 10 most common addresses, i.e. the top-10-list:

print(c.most_common(10))

Output:

[('192.168.1.1', 8), ('192.168.1.2', 4)]

This is a list with the addresses an their counts.

In your case, you need to give the counter all the addresses:

addresses = []
for eachLine in f:
    parts = eachLine.split('\t')
    if len(parts) > 1:
        ipAdd = parts[2]
        addresses.append(ipAdd.strip())
c = Counter(addresses)
print(c.most_common(10))

edited Feb 14 '17 at 17:30

answered Feb 14 '17 at 16:39

Mike Müller

82,630
20
166
161

I tried doing this, but all it did was add a 1 at the end of every record. It did not actually group them or count how many there were of each record. – mattp341 Feb 14 '17 at 16:57
Are all the records different? – Mike Müller Feb 14 '17 at 17:00
Some are, but I am getting the exact same output as before I applied what you suggested, except a "1" has been appended to the end of each record. – mattp341 Feb 14 '17 at 17:05
Don't make new counter for each IP address. Collect them all and then count. – Mike Müller Feb 14 '17 at 17:23

Grouping together "like values" from a list

1 Answers1