0

Given a very large list A, C, A, D, A, B, C, D, A, C

How can I get the counts of the current element and the next efficiently? Something like:

AC, CA, AD, DA, AB, BC, CD, DA, AC

A : {A:0, B:1, C:2, D:1}
B : {A:0, B:0, C:1, D:0}
C : {A:1, B:0, C:0, D:1}
D : {A:2, B:0, C:0, D:0}

Or if I were to print it, it would produce:

    A   B   C   D
A       1   2   1

B           1

C   1           1

D   2
user3838436
  • 83
  • 1
  • 7

3 Answers3

4

If your input is large and of unknown length (streaming in possibly), then using iterators is ideal. The output table doesn't include entries for zero counts, because I'm not assuming that you know the set of all possible input items.

from itertools import tee, izip

# from http://stackoverflow.com/questions/5764782/iterate-through-pairs-of-items-in-a-python-list
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

inp = ['A', 'C', 'A', 'D', 'A', 'B', 'C', 'D', 'A', 'C']

table = {}
for a, b in pairwise(inp):
    table.setdefault(a, {})
    table[a].setdefault(b, 0)
    table[a][b] += 1

print(table)
Russel Simmons
  • 361
  • 1
  • 6
  • Set default to a `Counter()` instead of `{}`, so an occasional access of a missing key returns 0 as in the OP, and also eliminates the need for the second `setdefault` since counters initialise with 0 count. – Moses Koledoye Mar 15 '17 at 17:48
2

You can use a dictionary of Counters:

from collections import Counter
import itertools

myList = ['A', 'C', 'A', 'D', 'A', 'B', 'C', 'D', 'A', 'C']

d = {x:Counter() for x in set(myList)}

for x,y in zip(myList,itertools.islice(myList,1,None)):
    d[x].update(y)

print(d)

Output:

{'B': Counter({'C': 1}), 'A': Counter({'C': 2, 'B': 1, 'D': 1}), 'C': Counter({'A': 1, 'D': 1}), 'D': Counter({'A': 2})}

It is reasonably efficient in Python 3, especially after incorporating @Rawing's excellent idea of using itertools.islice(). I tested it on:

myList = [random.choice("ABCDEFGHIJKLMNOPQRSTUVWXYZ") for i in range(10**6)]

and it takes about a half a second on my machine, less than the time spent in constructing the list in the first place.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
  • 2
    `myList[1:]` should be `itertools.islice(myList, 1, None)` to avoid duplicating the list. (Since OP has stated that it is very large) – Aran-Fey Mar 15 '17 at 17:44
  • @Rawing That is a good idea. With the example of 1 million entries the speedup was slight but noticeable. Doubtless it would become more important as the size gets even larger, not to mention being more efficient with memory. Thanks. – John Coleman Mar 15 '17 at 17:56
1

You could use collections.Counter to count the occurrences of the elements and then convert that to a 2D dictionary:

import itertools
from collections import Counter

l = ['A', 'C', 'A', 'D', 'A', 'B', 'C', 'D', 'A', 'C']

# create a list of pairs of neighboring elements
neighbors = zip(l, itertools.islice(l, 1, None))

# count occurrences    
counts = Counter(neighbors)

# convert counts to a 2D dictionary
output = {}
for k in counts:
    if k[0] not in output:
        output[k[0]] = {}
    output[k[0]][k[1]] = counts[k]
print(output)

This will print

{'C': {'D': 1, 'A': 1}, 'D': {'A': 2}, 'A': {'C': 2, 'D': 1, 'B': 1}, 'B': {'C': 1}}
robodasha
  • 286
  • 6
  • 21