Is there any way to apply transition matrix on strings in either python or R?

Question

I have the following lines:

johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)
johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)
johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)

They are distributions and parameters of some data. We want to apply a transition matrix on them to obtain their probabilities. We tried many different codes, but we always obtain errors due to the different type of data.

we tried these solutions in these posts:

Generating Markov transition matrix in Python

Building a Transition Matrix using words in Python/Numpy

Calculate transition matrix of letters

The best solution we have tried until now:

import pandas as pd
transitions #Larger instances than the ones above in the post
df = pd.DataFrame(columns = ['state', 'next_state'])
for i, val in enumerate(transitions[:-1]): # We don't care about last state
    df_stg = pd.DataFrame(index=[0])
    df_stg['state'], df_stg['next_state'] = transitions[i], transitions[i+1]
    df = pd.concat([df, df_stg], axis = 0)
cross_tab = pd.crosstab(df['state'], df['next_state'])
cross_tab.div(cross_tab.sum(axis=1), axis=0)

result:

state   alpha(a=1.10, loc=-94626.86, scale=1135344.81)  dgamma(a=0.61, loc=820000.00, scale=1885232.33) dgamma(a=0.78, loc=780000.00, scale=349653.54)  dgamma(a=0.81, loc=761200.00, scale=404939.11)  dweibull(c=0.77, loc=730000.00, scale=356863.56)    dweibull(c=0.90, loc=700000.00, scale=375807.48)    foldcauchy(c=2.59, loc=1423.70, scale=313236.41)    gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)  gennorm(beta=0.12, loc=725000.01, scale=0.00)   gennorm(beta=0.19, loc=545200.00, scale=38.09)  gennorm(beta=0.33, loc=575900.00, scale=7595.02)    gennorm(beta=0.33, loc=580090.00, scale=9423.99)    gennorm(beta=0.34, loc=532822.50, scale=7547.83)    gennorm(beta=0.42, loc=750000.00, scale=22359.35)   gennorm(beta=0.47, loc=666600.00, scale=42042.13)   johnsonsu(a=-0.02, b=0.50, loc=770186.45, scale=32359.52)   johnsonsu(a=-0.49, b=0.40, loc=561967.63, scale=65812.06)   johnsonsu(a=0.31, b=0.47, loc=835025.10, scale=53272.01)    johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)   loglaplace(c=1.63, loc=-927.08, scale=640927.08)    loglaplace(c=2.42, loc=-1009.51, scale=773124.55)   pearson3(skew=2.13, loc=908886.62, scale=577310.56) t(df=0.08, loc=700000.00, scale=1.71)   vonmises_line(kappa=2.01, loc=741142.93, scale=449091.04)
alpha(a=1.10, loc=-94626.86, scale=1135344.81)  19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
dgamma(a=0.61, loc=820000.00, scale=1885232.33) 0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
dgamma(a=0.78, loc=780000.00, scale=349653.54)  0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0
dgamma(a=0.81, loc=761200.00, scale=404939.11)  0   0   0   19  0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
dweibull(c=0.77, loc=730000.00, scale=356863.56)    0   0   0   0   19  0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0
dweibull(c=0.90, loc=700000.00, scale=375807.48)    0   0   0   0   0   19  0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
foldcauchy(c=2.59, loc=1423.70, scale=313236.41)    0   0   0   0   1   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
gausshyper(a=1.50, b=0.67, c=2.50, z=3.68, loc=77873.97, scale=2249451.03)  0   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
gennorm(beta=0.12, loc=725000.01, scale=0.00)   0   0   0   1   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.19, loc=545200.00, scale=38.09)  0   0   0   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   1   0   0
gennorm(beta=0.33, loc=575900.00, scale=7595.02)    0   0   0   0   0   1   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.33, loc=580090.00, scale=9423.99)    0   0   0   0   0   0   0   0   0   0   0   19  1   0   0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.34, loc=532822.50, scale=7547.83)    0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0   1   0   0   0   0   0   0   0
gennorm(beta=0.42, loc=750000.00, scale=22359.35)   0   0   0   0   0   0   1   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0   0   0
gennorm(beta=0.47, loc=666600.00, scale=42042.13)   0   0   0   0   0   0   0   0   0   0   0   1   0   0   19  0   0   0   0   0   0   0   0   0
johnsonsu(a=-0.02, b=0.50, loc=770186.45, scale=32359.52)   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0   0   0   0   0   0
johnsonsu(a=-0.49, b=0.40, loc=561967.63, scale=65812.06)   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   19  0   0   0   0   0   0   0
johnsonsu(a=0.31, b=0.47, loc=835025.10, scale=53272.01)    0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0   0   0   0
johnsonsu(a=0.35, b=0.76, loc=973796.40, scale=134834.36)   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   19  0   0   0   0   0
loglaplace(c=1.63, loc=-927.08, scale=640927.08)    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   19  0   0   0   0
loglaplace(c=2.42, loc=-1009.51, scale=773124.55)   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  0   0   0
pearson3(skew=2.13, loc=908886.62, scale=577310.56) 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   19  0   0
t(df=0.08, loc=700000.00, scale=1.71)   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19  1
vonmises_line(kappa=2.01, loc=741142.93, scale=449091.04)   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   19

The probabilites are wrong. The last code outputs 0 for the most values in the transition matrix. Yet, if the index and the column are similar to each other their value becomes 19

@TobiasWilfert The post has been updated with the solutions in different links — Abdulaziz Al Jumaia, Dec 18 '18 at 14:26
The question is unclear. Please provide your imports, your errors and more details. — jason m, Dec 18 '18 at 14:33
The last code outputs 0 for the most values in the transition matrix. Yet, if the index and the column are similar to each other their value becomes 19. @jasonm — Abdulaziz Al Jumaia, Dec 18 '18 at 14:38

score 0 · Accepted Answer · answered Dec 19 '18 at 00:10

I have solved the problem. I just noticed that the output gives 19 and zeros as the data is not shuffled. Thus, I shuffled the data and then I ran the code. The data are as desired.

In this example, I will add characters instead of distributions and parameters so I can make things simpler.

transitions = ['A', 'B', 'B', 'C', 'A', 'A', 'A', 'Z']
from itertools import islice

def window(seq, n=2):
    "Sliding window width n from seq.  From old itertools recipes."""
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

import pandas as pd

pairs = pd.DataFrame(window(transitions), columns=['state1', 'state2'])
counts = pairs.groupby('state1')['state2'].value_counts()
probs = (counts / counts.sum()).unstack()

DF_probs = pd.DataFrame(probs)
df = DF_probs.fillna(0)

The results:

state2         A         B         C         Z
state1                                        
A       0.285714  0.142857  0.000000  0.142857
B       0.000000  0.142857  0.142857  0.000000
C       0.142857  0.000000  0.000000  0.000000

reference: Brad Solomon [Solution]

Is there any way to apply transition matrix on strings in either python or R?

1 Answers1