0

i need to know how to use HMM on top of Apache Spark. Its not present in MLlib. Is there any alternatives ?

Thanks

Elsayed

Sergei Lebedev
  • 2,659
  • 20
  • 23

2 Answers2

2

Best I can find is a 2 year old implementation on spark.

You might want to investigate using something other than spark or HMM or just bite the bullet and implement it yourself. Implementing the viterbi algorithm is not particularly hard, here is my many years old implementation.

Community
  • 1
  • 1
placeybordeaux
  • 2,138
  • 1
  • 20
  • 42
0

HMM algorithm - excerpts from https://en.wikipedia.org/wiki/Hidden_Markov_model

Hidden Markov Model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (i.e. hidden) states. The hidden markov model can be represented as the simplest dynamic Bayesian network.

A hidden Markov model can be considered a generalization of a mixture model where the hidden variables (or latent variables), which control the mixture component to be selected for each observation, are related through a Markov process rather than independent of each other.

Applying the principle of dynamic programming, this problem, too, can be handled efficiently using the forward algorithm.

Have not seen algorithms around the above concepts implemented on Spark.

Spark can support "beyond map-reduce" algorithms but the only thing with dynamic programming I could find was https://github.com/bbengfort/brisera

A Python implementation of a distributed seed and reduce algorithm (similar to BlastReduce and CloudBurst) that utilizes RDDs (resilient distributed datasets) to perform fast iterative analyses and dynamic programming without relying on "chained MapReduce jobs".

Mahout has an HMM implementation but unsure if it is distributed https://mahout.apache.org/users/classification/hidden-markov-models.html

Community
  • 1
  • 1
SemanticBeeng
  • 937
  • 1
  • 9
  • 15