I'm studying Markov Random Fields, and, apparently, inference in MRF is hard / computationally expensive. Specifically, Kevin Murphy's book Machine Learning: A Probabilistic Perspective says the following:
"In the first term, we fix y to its observed values; this is sometimes called the clamped term. In the second term, y is free; this is sometimes called the unclamped term or contrastive term. Note that computing the unclamped term requires inference in the model, and this must be done once per gradient step. This makes training undirected graphical models harder than training directed graphical models."
Why are we performing inference here? I understand that we're summing over all y's, which seems expensive, but I don't see where we're actually estimating any parameters. Wikipedia also talks about inference, but only talks about calculating the conditional distribution, and needing to sum over all non-specified nodes.. but.. that's not what we're doing here, is it?
Alternatively, any have good intuition on why inference in MRF is difficult?
Sources: Chapter 19 of ML:PP: https://www.cs.ubc.ca/~murphyk/MLbook/pml-print3-ch19.pdf
Specific section seen below