I wanted to know what is the mathematical justification for using ICM as an approximation for the E step in an EM algorithm.
As I understand in the E step the idea is to find a distribution that is equal to the posterior distribution of the latent variable, which guarantees that the likelihood increases or find the best possible distribution from some simpler family of distributions which guarantees that a lower bound of the likelihood functions increases.
How does one mathematically justify the use of ICM in such an E-step? Any reference/derivations/notes would be very helpful.