1

So I am trying to implement code that will count the next letter in a sentence, using python. so for instance,

"""So I am trying to implement code that will count the next letter in a sentence, using 
python"""

most common letters one after the other

  1. for 's'

    • 'o' :1
    • 'e' :1
  2. for 'o'

    • ' ' :1
    • 'd' :1
    • 'u' :1
    • 'n' :1

I think you get the idea

I already have written code for counting letters prior

def count_letters(word, char):
    count = 0
    for c in word:
        if char == c:
            count += 1
    return count

As you can see this just counts for letters, but not the next letter. can someone give me a hand on this one?

pault
  • 41,343
  • 15
  • 107
  • 149
  • 1
    Possible duplicate of [Markov chain on letter scale and random text](https://stackoverflow.com/questions/8660015/markov-chain-on-letter-scale-and-random-text) – Andrey Tyukin Mar 30 '18 at 19:24

4 Answers4

3
from collections import Counter, defaultdict

counts = defaultdict(Counter)

s = """So I am trying to implement code that will count the next letter in a sentence, using
python""".lower()

for c1, c2 in zip(s, s[1:]):
    counts[c1][c2] += 1

(apart from being simpler, this should be significantly faster than pault's answer by not iterating over the string for every letter)

Concepts to google that aren't named in the code:

  • for c1, c2 in ... (namely the fact that there are two variables): tuple unpacking
  • s[1:]: slicing. Basically this is a copy of the string after the first character.
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
  • Looks good. I will add here some further reading: [Understanding python's slice notation](https://stackoverflow.com/questions/509211/understanding-pythons-slice-notation) and [Zip lists in python](https://stackoverflow.com/questions/13704860/zip-lists-in-python) – pault Mar 30 '18 at 19:53
  • Oh, wow... The `defaultdict` is definitely something I should add to my toolbox. Thanks! – Andrey Tyukin Mar 30 '18 at 19:55
1

Here's a way using collections.Counter:

Suppose the string you provided was stored in a variable s.

First we iterate over the set of all lower case letters in s. We do this by making another string s_lower which will convert the string s to lowercase. We then wrap this with the set constructor to get unique values.

For each char, we iterate through the string and check to see if the previous letter is equal to char. If so, we store this in a list. Finally, we pass this list into the collections.Counter constructor which will count the occurrences.

Each counter is stored in a dictionary, counts, where the keys are the unique characters in the string.

from collections import Counter

counts = {}
s_lower = s.lower()
for char in set(s_lower):
    counts[char] = Counter(
        [c for i, c in enumerate(s_lower) if i > 0 and s_lower[i-1] == char]
    )

For your string, this has the following outputs:

>>> print(counts['s'])
#Counter({'i': 1, 'e': 1, 'o': 1})

>>> print(counts['o'])
#Counter({' ': 2, 'd': 1, 'n': 1, 'u': 1})

One caveat is that this method will iterate through the whole string for each unique character, which could potentially make it slow for large lists.


Here is an alternative approach using collections.Counter and collections.defaultdict that only loops through the string once:

from collections import defaultdict, Counter

def count_letters(s):
    s_lower = s.lower()
    counts = defaultdict(Counter)
    for i in range(len(s_lower) - 1):
        curr_char = s_lower[i]
        next_char = s_lower[i+1]
        counts[curr_char].update(next_char)
    return counts

counts = count_letters(s)

We loop over each character in the string (except the last) and on each iteration we update a counter using the next character.

pault
  • 41,343
  • 15
  • 107
  • 149
1

Here is a relatively terse way to do it:

from itertools import groupby
from collections import Counter

def countTransitionFrequencies(text):
  prevNext = list(zip(text[:-1], text[1:]))
  prevNext.sort(key = lambda pn: pn[0])
  transitions = groupby(prevNext, lambda pn: pn[0])
  freqs = map(
    lambda kts: (kts[0], Counter(map(lambda kv: kv[1], kts[1]))), 
    transitions
  )
  return freqs

Explanation:

  1. zip creates list of pairs with (previous, next) characters
  2. The pairs are sorted and grouped by the previous character
  3. The frequencies of the next characters (extracted from pairs by kv[1]) are then counted using Counter.

Sorting is not really necessary, but unfortunately, this is how the provided groupby works.

An example:

for k, v in countTransitionFrequencies("hello world"):
  print("%r -> %r" % (k, v))

This prints:

' ' -> Counter({'w': 1})
'e' -> Counter({'l': 1})
'h' -> Counter({'e': 1})
'l' -> Counter({'l': 1, 'o': 1, 'd': 1})
'o' -> Counter({' ': 1, 'r': 1})
'r' -> Counter({'l': 1})
'w' -> Counter({'o': 1})
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
0

This should work, the only thing is it doesn't sort the values, but that can be solved by creating a new dictionary with list of tuples (char, occurrences) and using sorted function on tuple[1].

def countNext(word):
    d = {}
    word = word.lower()
    for i in range(len(word) - 1):
        c = word[i]
        cc = word[i+1]
        if(not c.isalpha() or not cc.isalpha()):
            continue
        if c in d:
            if cc in d[c]:
                d[c][cc] += 1
            else:
                d[c][cc] = 1
        else:
            d[c] = {}
            d[c][cc] = 1
    return d
Stoicas
  • 101
  • 4