Running count of first and second element of a list

Question

Given a very large list A, C, A, D, A, B, C, D, A, C

How can I get the counts of the current element and the next efficiently? Something like:

AC, CA, AD, DA, AB, BC, CD, DA, AC

A : {A:0, B:1, C:2, D:1}
B : {A:0, B:0, C:1, D:0}
C : {A:1, B:0, C:0, D:1}
D : {A:2, B:0, C:0, D:0}

Or if I were to print it, it would produce:

    A   B   C   D
A       1   2   1

B           1

C   1           1

D   2

score 4 · Answer 1 · answered Mar 15 '17 at 17:18

If your input is large and of unknown length (streaming in possibly), then using iterators is ideal. The output table doesn't include entries for zero counts, because I'm not assuming that you know the set of all possible input items.

from itertools import tee, izip

# from http://stackoverflow.com/questions/5764782/iterate-through-pairs-of-items-in-a-python-list
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

inp = ['A', 'C', 'A', 'D', 'A', 'B', 'C', 'D', 'A', 'C']

table = {}
for a, b in pairwise(inp):
    table.setdefault(a, {})
    table[a].setdefault(b, 0)
    table[a][b] += 1

print(table)

Set default to a `Counter()` instead of `{}`, so an occasional access of a missing key returns 0 as in the OP, and also eliminates the need for the second `setdefault` since counters initialise with 0 count. — Moses Koledoye, Mar 15 '17 at 17:48

John Coleman · Answer 2 · 2017-03-15T17:54:20.993

2

You can use a dictionary of Counters:

from collections import Counter
import itertools

myList = ['A', 'C', 'A', 'D', 'A', 'B', 'C', 'D', 'A', 'C']

d = {x:Counter() for x in set(myList)}

for x,y in zip(myList,itertools.islice(myList,1,None)):
    d[x].update(y)

print(d)

Output:

{'B': Counter({'C': 1}), 'A': Counter({'C': 2, 'B': 1, 'D': 1}), 'C': Counter({'A': 1, 'D': 1}), 'D': Counter({'A': 2})}

It is reasonably efficient in Python 3, especially after incorporating @Rawing's excellent idea of using itertools.islice(). I tested it on:

myList = [random.choice("ABCDEFGHIJKLMNOPQRSTUVWXYZ") for i in range(10**6)]

and it takes about a half a second on my machine, less than the time spent in constructing the list in the first place.

edited Mar 15 '17 at 17:54

answered Mar 15 '17 at 17:27

John Coleman

51,337
7
54
119

2

`myList[1:]` should be `itertools.islice(myList, 1, None)` to avoid duplicating the list. (Since OP has stated that it is very large) – Aran-Fey Mar 15 '17 at 17:44
@Rawing That is a good idea. With the example of 1 million entries the speedup was slight but noticeable. Doubtless it would become more important as the size gets even larger, not to mention being more efficient with memory. Thanks. – John Coleman Mar 15 '17 at 17:56

robodasha · Answer 3 · 2017-03-15T17:42:03.177

You could use collections.Counter to count the occurrences of the elements and then convert that to a 2D dictionary:

import itertools
from collections import Counter

l = ['A', 'C', 'A', 'D', 'A', 'B', 'C', 'D', 'A', 'C']

# create a list of pairs of neighboring elements
neighbors = zip(l, itertools.islice(l, 1, None))

# count occurrences    
counts = Counter(neighbors)

# convert counts to a 2D dictionary
output = {}
for k in counts:
    if k[0] not in output:
        output[k[0]] = {}
    output[k[0]][k[1]] = counts[k]
print(output)

This will print

{'C': {'D': 1, 'A': 1}, 'D': {'A': 2}, 'A': {'C': 2, 'D': 1, 'B': 1}, 'B': {'C': 1}}

Running count of first and second element of a list

3 Answers3