I am trying to perform a classification procedure where my training data looks like this:
(state, (feature_1, feature_2, feature_3, ..., feature_n))
Thus, given a set of features, I need to predict what state/label/class those features most likely correspond to.
I have the nice CRFSuite model set up for making CRFs very fast, but is a CRF really ideal for this kind of learning? I used CRF in the past for sequences of states, that is the label of the $nth$ state may also depend on the label / features of the previous $n-1$ states. For example, here is a training sequence I used for trying to predict a child's phonetic output given the adult IPA transcription:
e Adult=e __BOS__
i Adult=-
d Adult=d
r Adult=-
i Adult=i
ə Adult=-
n Adult=- __EOS__
A CRF makes sense for this data because phonology/phonetics is very regular--what sound is chosen highly affects future sound choices, e.g. a vowel will probably be followed by a consonant and not another vowel.
I (believe) understand that a CRF is actually just a sequential form of a Maxent model. So if all my training sequences are always length $1$, will I basically just have a Maxent model called a CRF?
This question CRF for named entity recognition addressed using a CRF for named entity recognition, but I am guessing it uses sequences of states?