Online Maximum Likelihood Estimation of (multivariate) Gaussian Distributions

Online Maximum Likelihood Estimation of (multivariate) Gaussian Distributions

This post is the first part of a series of five articles:

  1. Online Maximum Likelihood Estimation of (multivariate) Gaussian Distributions
  2. Online Estimation of Weighted Sample Mean and Coviarance Matrix
  3. The Covariance of weighted Means
  4. Memory of the exponentially decaying Estimator for Mean and Covariance Matrix
  5. Online Estimation of the Inverse Covariance Matrix

\(
\def\myX{\mathbf{x}} \def\myY{\mathbf{y}} \def\mySigma{\mathbf{\Sigma}} \def\myM{\mathbf{M}} \def\myT{\mathsf{T}} \)

When one has to estimate the parameters (mean vector and covariance matrix ) of a multivariate Gaussian distribution, the following well known relations – which represent the the maximum likelihood estimates – are commonly used:

where is the i-th example vector of a sample of size .

The above two equations can be easily implemented for the offline case with already existing data. However, in practice, one often has to deal with data streams, where one example has to be processed at a time. This is where things get more difficult.

In such situations it is usually not feasible to repeatedly compute Eq. \eqref{eq:meanDef} and Eq. \eqref{eq:SigmaDef} for the each newly collected data instance again. It would make more sense to slightly adapt the previous estimates and to fit the new point as well, in order to get and .

Online Estimation of the Mean

For the mean vector, we can actually find such a solution easily:

where:

The update-rule in Eq. \eqref{eq:onlineMean} in combination with Eq. \eqref{eq:DeltaN} should be sufficiently robust for most practical applications.

Online Estimation of the Covariance Matrix

Finding update rules for the online adaptation of the covariance matrix is slightly more difficult. However, once the math is done, the resulting equations are also rather simple.

One can start with the unnormalized sum of example covariances (the so called scatter matrix) for a sample size of (we initially do not consider the normalization by ):

where is the j-th element of the vector at time step .

For the covariance between two features j and k, in order to update to , one can compute the increment :

Using the relations

and

we can further simplify:

In vectorized form we finally get the following recursive relation (where and are both column vectors):

In case one prefers to update the Covariance Matrix with “mini-batches” instead of single examples, this can be realized with the following equation:

where is the size of a mini-batch and is the mini-batch with rows. Typically, the computation time can be reduced slightly when mini-batches are utilized. The unbiased covariance matrix can then be computed in every time step with:

Finding a simpler Formulation for the Covariance Matrix

However, the above formulation still appears to be slightly too complex. So we try again to find a nicer online update rule for the covariance matrix, by starting again from the original representation:

This leads to the following covariance matrix

which can also be computed in vectorized form (replacing Bessel’s correction term simply by ) if all example vectors are organized as rows in a matrix with rows (just as a side note):

Further Simplifications

With equation , we can actually find a even simpler online rule for updating the covariance matrix. Let us start again with :

If we look at the above equation we can see that it is actually identical to Eq. \eqref{eq:DeltaMResult}. So basically, we just found an easier way to derive Eq. \eqref{eq:DeltaMResult}. Nevertheless, after experimenting a little with different approaches, we can manage to simplify \eqref{eqref:DeltaM2} slightly. The main idea is, to get rid of the terms and , since they are not that appealing for our on-line formulation. With the already known expression

and with a rearranged formulation of above equation

we can start simplifying Eq. \eqref{eqref:DeltaM2}. We insert Eq. \eqref{eq:xmeanNew} and Eq. \eqref{eq:xmeanOld} into \eqref{eqref:DeltaM2} and see where it leads us:

where is already defined in Eq. \eqref{eq:DeltaN}.

Unified Algorithmic Description

Overall, this results in only a few very simple update rules, which can be executed in the following order:

In case of a univariate Gaussian distribution, the above instructions reduce to:

where is now the sample variance and the sample mean at time step .

Example

An implementation of the above update rules in R could look as follows:

require(mvtnorm)
require(ggplot2)
require(reshape2)

onlineEstimator <- function(X) {
    n = ncol(X)
    M = matrix(rep(0,n^2), nrow=n)
    mean = matrix(rep(0,n))
    for(i in 1:nrow(X)) {
      x = matrix(X[i,])
      delta = x - mean
      mean = mean + delta / i
      M = M + delta %*% t(x - mean)
      if(i>1) cov <- M / (i-1)
    }
    return (list(meanEst=mean, covEst=cov))
  }

Now let us specify a multivariate Gaussian with the parameters

and draw a sample of size from this distribution. Then we run over the drawn examples in an online manner and attempt to estimate the mean and covariance matrix again:

# Example: Sample from a pre-specified distribution and then try to estimate
# the distribution's parameters
covMat <- matrix(c(1,-1,.5,-1,3,-1,.5,-1,3), nrow=3)
N = 1000
X <- mvrnorm(n=N,Sigma = covMat,  mu = c(1,10,20), tol= 1e-13)
onlineEstimator(X)

After running through the sample of size we could get the following output:

> onlineEstimator(X)
$meanEst
           [,1]
[1,]  0.9908257
[2,] 10.0214919
[3,] 20.0737200

$covEst
           [,1]      [,2]       [,3]
[1,]  1.0374998 -1.017537  0.5327917
[2,] -1.0175368  3.038815 -1.0029339
[3,]  0.5327917 -1.002934  3.0590012

We can also plot how the estimations change over time. The result could something like this (note: only a few estimated components are shown):

alt and id

Figure 1: Online estimation of the parameters of a Gaussian distribution. Shown are a few components of the mean vector and covariance matrix. After processing a sufficient number of examples the estimations converge towards the true values.

The above graph was generated with the following R code:

onlineEstimator <- function(X) {
    n = ncol(X)
    M = matrix(rep(0,n^2), nrow=n)
    mean = matrix(rep(0,n))
    myMeans <- data.frame()
    myCovs <- data.frame()
    for(i in 1:nrow(X)) {
      x = matrix(X[i,])
      delta = x - mean
      mean = mean + delta/(i)
      M = M + delta %*% t(x - mean)
      mycov <- if(i>1) M / (i-1) else M
      myMeans <- rbind(myMeans, as.vector(mean))
      myCovs <- rbind(myCovs, as.vector(mycov))
    }
    return (list(meanEst=mean, covEst=mycov, allMeans = myMeans, allCovs = myCovs))
  }

  covMat <- matrix(c(1,-1,.5,-1,3,-1,.5,-1,3), nrow = 3)
  N = 1000
  X <- mvrnorm(n=N,Sigma = covMat,  mu = c(1,10,20), tol= 1e-13)
  res <- onlineEstimator(X)

  ts.plot(res$allMeans[,1])

  allEsts <- cbind(n=1:N,res$allMeans, res$allCovs)
  colnames(allEsts) <- c("n", paste("mean_", c(1:3), sep=""), paste("cov_", 1:9, sep=""))
  plotCols <- c(1:3,5,6,7,9)
  plotdf <- melt(allEsts[,plotCols], id.vars = "n")
  colnames(allEsts)[plotCols] # show the names for us
  legendLabels <- c(expression(bar(x)[1]), expression(bar(x)[2]), expression(Sigma[11]), expression(Sigma[12]), expression(Sigma[13]), expression(Sigma[22]) )
  ggplot(data = plotdf, aes(x=n, y=value, color = variable)) +
    geom_line() +
    scale_color_discrete(labels = legendLabels) +
    theme_bw() +
    theme(axis.text=element_text(size=13),
          axis.title=element_text(size=15),
          legend.text=element_text(size=13),
          legend.title=element_text(size=15))

Markus Thill

Markus Thill
I studied computer engineering (B.Sc.) and Automation & IT (M.Eng.). Generally, I am interested in machine learning (ML) approaches (in the broadest sense), but particularly in the fields of time series analysis, anomaly detection, Reinforcement Learning (e.g. for board games), Deep Learning (DL) and incremental (on-line) learning procedures.

Deriving a Closed-Form Solution of the Fibonacci Sequence using the Z-Transform

The Fibonacci sequence might be one of the most famous sequences in the field of mathmatics and computer science. Already high school stu...… Continue reading