Pandas - Calculate probability of sequence from Markov chain df

Question

I want to calculate the probability of several sequences in a Markov Chain. I got the Markov Chain ready, but I am not sure how to easily calculate specific sequence probabilities.

My pandas dataframe with A-E on the left as the index and A-E on the top as columns is called Markov, looks as follows:

    A   B   C   D   E
A   0.3 0.2 0.5 0.0 0.2
B   0.2 0.4 0   0   0.4
C   0.5 0.4 0   0.1 0
D   0.2 0.2 0.2 0.2 0.2 
E   0.6 0.1 0.1 0.1 0.1

let's assume I want to check the probability of the sequence called sequence: ['A', 'C', 'D']. Which would mean the transition A to C, C to D. It should result in 0.05.

I succeeded by using the pandas .at function:

markov.at[sequence[0], sequence[1]] * markov.at[sequence[1], sequence[2]].

However, I would like to build a function that when I hand it a table of sequences on each row which vary in length, it calculates the corresponding sequence probabilities. In my approach, I have to manually alter the code each time I want to check a specific sequence.

How could I achieve this? Am I overlooking a building feature of pandas to perform such calculations?

score 1 · Accepted Answer · answered Jun 17 '19 at 14:22

1

You could define a function like this:

def get_prob(*args):
    ret = 1
    for i, j in zip(args, args[1:]):
        ret *= markov.at[i,j]

    return ret

And then call:

get_prob('A','C','D')
# 0.05

get_prob('A', 'C', 'D', 'E')
# 0.010000000000000002

Or you can do:

def get_prob2(lst):
    ret = 1
    for i,j in zip(lst, lst[1:]):
        ret *= markov.at[i,j]

    return ret

so you could pass a string (or a list):

get_prob2('ACDE')
# 0.010000000000000002

answered Jun 17 '19 at 14:22

Quang Hoang

146,074
10
56
74

works great! could you possible explain the use of *args here because I always have problems interpreting it. – intStdu Jun 17 '19 at 14:43
Basically, I think of `*args` as the (reference to the) list of all the unnamed variables you pass to the function (as oppose to the named variables like `get_prob3(df=markov, walks='ABCD')`. You can find details [here](https://stackoverflow.com/questions/36901/what-does-double-star-asterisk-and-star-asterisk-do-for-parameters). – Quang Hoang Jun 17 '19 at 14:47

Pandas - Calculate probability of sequence from Markov chain df

1 Answers1