Here we shall introduce the Expectation Conditional Maximization algorithm (ECM) by Meng and Rubin (1993) by motivating it from a typical example. I’ll also add some thoughts about other natural considerations at the end.

Let observations be sampled i.i.d. where ’s are given features for each . Let both the linear coefficients and the covariance be unknown, and say for each only some values are observed, i.e., there exists missing data. Partition the observations , where represents all those observed and the missing data.

Now consider the problem of calculating the maximum likelihood estimate (MLE) of . This has no closed form expression, so let’s consider the EM algorithm (Dempster et al., 1977) instead:

  1. Initialize .
  2. While has not converged, i.e., :
    1. E-step: Calculate expectation of the sufficient statistics, conditional on observed data and current parameter values:

    1. M-step: Substitute the above into expressions for the sufficient statistics

ECM is a natural consideration for EM, which replaces the maximization step over one’s parameters of interest by conditioning on a subset of these parameters

PxEM is a way of incorporating additional information into one’s estimate by modifying the current parameters based on some set of constraints you know must be true.

Intuitively, I see the advantage of this variant in the same way the Stein’s estimator dominates the MLE for the trivial example of three observations sampled i.i.d. from a normal , where ’s are unknown for . It is easy to see that the MLE is . However, this estimator is inadmissable; it is dominated by the estimate … While this seems paradoxical, what is really happening is the underlying truth that all observations are sampled from a normal distribution with the same variance. By taking advantage of the fact that all observations share this attribute, you can obtain a better estimate by introducing dependencies.

References

  • Arthur Dempster, Nan Laird, and Donald Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B 39 (1): 1–38, 1977.
  • Xiao-Li Meng and Donald Rubin. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80 (2): 267–278, 1993.