1

Here is my problem, I'm trying to teach a Hidden Markov Models using hmmlearn. I'm new to the language, and I have some difficulties to understand the differences between lists and arrays. Here is my code:

from hmmlearn import hmm
from babel import lists
import numpy as np
import unidecode as u
from numpy import char

l = []
data = []
gods_egypt = ["Amon","Anat","Anouket","Anubis","Apis","Atoum","Bastet","Bès","Gheb","Hâpy","Harmachis","Hathor","Heh","Héket","Horus","Isis","Ka","Khepri","Khonsou","Khnoum","Maât","Meresger","Mout","Nefertoum","Neith","Nekhbet","Nephtys","Nout","Onouris","Osiris","Ouadjet","Oupaout","Ptah","Rê","Rechef","Renenoutet","Satet","Sebek","Sekhmet","Selkis","Seth","Shou","Sokaris","Tatenen","Tefnout","Thot","Thouéris"]
for i in range(0, len(gods_egypt)):
    data.append([])
    for j in range(0, len(gods_egypt[i])):
        data[i].append([u.unidecode(gods_egypt[i][j].lower())])
    l.append(len(data[i]))
data = np.asarray(data).reshape(-1,1)
model = hmm.MultinomialHMM(20, verbose=True)
model = model.fit(data, l)

and the resulting output

Traceback (most recent call last):
  File "~~~\HMM_test.py", line 17, in <module>
    model = model.fit(data, l)
  File "~~~\Python\Python36\site-packages\hmmlearn\base.py", line 420, in fit
    X = check_array(X)
  File "~~~\Python36-32\lib\site-packages\sklearn\utils\validation.py", line 402, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.

I have seen at ValueError: setting an array element with a sequence that it might be a problem of different array length, but I can't figure out how to solve it.

Any suggestion ?

Kahsius
  • 747
  • 2
  • 7
  • 18

1 Answers1

2

The error itself comes from the fact that model.fit() is expecting an array of arrays of numerical values. Right now your input data is an array of arrays of list of lists of string. This is what provokes an error as the function finds that the array element that it is expecting is a sequence i.e., the list (of lists of strings).

However, even if you fix the list issue, another issue will arise: Learning an HMM implies computing numerical quantities via some set of equations. The input data to learn an HMM should be numerical, not a set of letters. (Except if hmmlearn has a very special option for characters that I am not aware of.)

You need to first transform the letters into numbers if you want to work with HMMs.

I do not know what you end goal is. HMM are aimed at modeling data for generation or classification purpose (if several HMMs are trained). What are you intending to do once you have a trained model from the letters composing the words?

As for the format in which the data should be provided to the different functions, I suggest that you give a look at the documentation. It includes tutorials for the use of the library.

Eskapp
  • 3,419
  • 2
  • 22
  • 39
  • 1
    Thank you for your answer. Indeed I supposed here that the HMM in `hmmlearn` was able to generate any kind of symbols, not only numbers, my bad. Wouldn't it be a problem to generate ordinal numbers rather than cardinal letters ? I will try to investigate this problem. To answer your question, the point of this code is to learn the sequences of letters in a dictionnary of names in order to get a model which would allow me to generate some random names with the same intrinsic construction as the learned ones. – Kahsius Aug 17 '17 at 23:32
  • Yes assigning a number to each letter could be a first idea to try out. 1 -> A; 2->B,... and so on, skipping the letters you do not want to appear. – Eskapp Aug 20 '17 at 18:14