Guy who wrote the article speaking. Glad you found it useful! Now, my first implementation of a Markov chain was actually in Python, so this answer will focus on how to write it in a more Pythonic way. I'll show how to go about making an order-2 Markov chain, since they're easy to talk about, but you can of course make it order-N with some modifications.
Data Structures
In js, the two prominent data structures are the generic object and the array (which is an extension to the generic object). In Python however, you have other options for more finely-grained control. Here're the major differences in the two implementations:
A state in our chain is really a tuple - an immutable, ordered structure, with a fixed amount of elements. We always want n
elements (in this case, n=2
) and their order has meaning.
Manipulating the memory will be easier if we use a defaultdict wrapping a list, so we can skip the "checking if a state doesn't exist, and then doing X", and instead just do X.
So, we stick a from collections import defaultdict
at the top and change how markov.memory
is defined:
memory = defaultdict(list)
Now we change markov.getInitial
to return a tuple (remember this explains an order-2 chain):
def getInitial(self):
return ('', '')
(if you want to expand it further, you can use a really neat Python trick: tuple([''] * 2)
will return the same thing. Instead of empty strings, you can use None
)
We'll get to changing what uses genInitial
in a bit.
Yield and iteration
A strong concept which doesn't exist in js (yet) but does exist in Python is the yield
operator (see this question for great explanations).
Another feature of Python is its generic for
loop. You can go over nearly anything quite easily, including generators (functions which use yield
). Combining the two, and we can redefine breakText
:
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
#will be explained in the next paragraph
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield prev, ''
The magic part above, prev = (prev[1], word)
can be explained best by example:
>>> a = (0, 1)
>>> a
(0, 1)
>>> a = (a[1], 2)
>>> a
(1, 2)
That's how we advance through the word list. And now we move up to what uses breakText
, to the redefinition of markov.learn
:
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
Because our memory is a defaultdict
, we don't have to worry about the key not existing.
A pee break on the side of the road
OK, we have half of the chain implemented, time to see it in action! What we have so far:
from collections import defaultdict
class Markov:
memory = defaultdict(list)
separator = ' '
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield (prev, '')
def getInitial(self):
return ('', '')
(I changed the class name from markov
to Markov
because I cringe every time a class begins with a lowercase letter). I saved it as brain.py
and loaded up Python.
>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.memory
defaultdict(<class 'list'>, {('had', 'a'): ['little'], ('Mary', 'had'): ['a'], ('', ''): ['Mary'], ('little', 'lamb'): [''], ('a', 'little'): ['lamb'], ('', 'Mary'): ['had']})
Success! Let's look at the result more carefully, to see that we got it right:
{ ('', ''): ['Mary'],
('', 'Mary'): ['had'],
('Mary', 'had'): ['a'],
('a', 'little'): ['lamb'],
('had', 'a'): ['little'],
('little', 'lamb'): ['']}
zips up Ready to drive on? We still have to use this chain!
Changing the step
function
We've already met what we need to remake step
. We have the defaultdict, so we can use random.choice
right away, and I can cheat a bit because I know the order of the chain. We can also get rid of the recursion (with some sorrow), if we see it as a function which takes a single step through the chain (my bad in the original article - a badly named function).
def step(self, state):
choice = random.choice(self.memory[state] or [''])
if not choice:
return None
nextState = (state[1], choice)
return choice, nextState
I regretfully added the or ['']
because random.choice
moans about empty lists. Finally, we move a larger portion of the logic to ask
(the actual construction of the sentence):
def ask(self, seed=False):
ret = []
if not seed:
seed = self.getInitial()
while True:
link = self.step(seed)
if link is None:
break
ret.append(link[0])
seed = link[1]
return self.separator.join(ret)
Yes, a bit yucky. We could have given step
a better name and made it a generator, but I'm late for a meeting with my pregnant wife who's about to give birth to a baby who left the stove on fire in my car that's being towed! I better hurry!
The grand finale
But first, a talk with bob:
from collections import defaultdict
import random
class Markov:
memory = defaultdict(list)
separator = ' '
def learn(self, txt):
for part in self.breakText(txt):
key = part[0]
value = part[1]
self.memory[key].append(value)
def ask(self, seed=False):
ret = []
if not seed:
seed = self.getInitial()
while True:
link = self.step(seed)
if link is None:
break
ret.append(link[0])
seed = link[1]
return self.separator.join(ret)
def breakText(self, txt):
#our very own (ε,ε)
prev = self.getInitial()
for word in txt.split(self.separator):
yield prev, word
prev = (prev[1], word)
#end-of-sentence, prev->ε
yield (prev, '')
def step(self, state):
choice = random.choice(self.memory[state] or [''])
if not choice:
return None
nextState = (state[1], choice)
return choice, nextState
def getInitial(self):
return ('', '')
And loading it up:
>>> import brain
>>> bob = brain.Markov()
>>> bob.learn('Mary had a little lamb')
>>> bob.ask()
'Mary had a little lamb'
>>> bob.learn('Mary had a giant crab')
>>> bob.ask(('Mary', 'had'))
'a giant crab'
There is, of course, room for improvement and expanding on the concept. But it wouldn't be any fun if if I just gave you the answer.
Hopefully this will still help after 4 months.