The Relationship between the Mahalanobis Distance and the Chi-Squared Distribution

The Relationship between the Mahalanobis Distance and the Chi-Squared Distribution

In practice, sometimes (multivariate) Gaussian distributions are used for anomaly detection tasks (assuming that the considered data is approx. normally distributed): the parameters of the Gaussian can be estimated using maximum likelihood estimation (MLE) where the maximum likelihood estimate is the sample mean and sample covariance matrix. After the estimating the parameters of the distribution one has to specify a critical value which separates the normal data from the anomalous data. Typically, this critical value is taken from the probability density function (PDF) in such a way that it is smaller than the PDF value of all normal data points in the data set. Then a new data point can be classified as anomalous if the value of the PDF for this new point is below the critical value. Hence, the critical value specifies a boundary which is used to separate normal from anomalous data. In the univariate case the boundary separates the lower and upper tails of the Gaussian from its center (mean). In the 2-dimensional case the boundary is an ellipse around the center and in higher dimensions the boundary can be described by an ellipsoid. But what do we do, if want to find a boundary in a way that separates the most unlikely 2% of the data points from a sample from the remaining 99%. In the univariate case, this scenario is simple: We just have to compute the first percentile (1% quantile) and 99th percentile. All points that end up in the specified tails would then be classified as anomalous. For the multivariate case this is not that straightforward any longer, since our boundary has to be described by an ellipsoid. However, there is a way out of this problem, which has to do with a so called Mahalanobis-distance, as we will see in the following.

\( \def\matr#1{\mathbf #1} \def\tp{\mathsf T} \)

Prerequisites

Multiplication of a Matrix with its Transpose

Generally, the product of a matrix and a matrix is defined as:

Then, the multiplication of a matrix with its transpose can be written as:

where is the th column vector of matrix . Another trivial relation required later is as follows:

Inverse of a Matrix-Product

since

Eigenvalues and Eigenvectors

For a matrix solve:

A value which fulfills the equation is called an eigenvalue of and the corresponding vector is called eigenvector. When the eigenvectors of are arranged in a matrix , we have:

where is a diagonal matrix containing the eigenvalues of the corresponding eigenvectors. This representation, where the matrix is represented in terms of its eigenvalues and eigenvectors is also called eigenvalue decomposition. For symmetric matrices , the eigenvectors are orthogonal (orthonormal) and the matrix is orthogonal as well (the product with its transpose is the identity matrix). In this case , and equation can be written as:

In this case, also the square root of (written here as ) – such that – can be easily found to be:

since

The eigenvalue decomposition of the inverse of a matrix can be computed as follows, using the relation described in equation and the associative property of the matrix product:

Note that is again a diagonal matrix containing the inverse eigenvalues of .

Linear Affine Transform of a Normally Distributed Random Variable

Assume we apply a linear affine transform to a random variable with a mean vector and a covariance matrix in order to create a new random variable :

One can compute the new mean and covariance matrix for :

and

Quantile Estimation for multivariate Gaussian Distributions

  • Calculating quantiles for multivariate normal distributions is not that trivial as in the one-dimensional case, since we cannot simply compute the integral in the tails of the distribution
  • The quantiles in the bivariate case can be seen as ellipses, in higher dimensions as ellipsoids
  • The Mahalanobis distance is an interesting measure to describe all points on the surface of an ellipsoid.
  • More formal: The usual quantile definition requires a random variable: The p-quantile for a random distribution is the value that fulfills . In the case of a multivariate normal distribution we can take the squared Mahalanobis distance between a point of the multivariate normal distribution and its mean as such a random variable. Then the p-quantile computation will answer the following question: Which value is required so that a random point fulfills ? In other words, when we pick a random point from the distribution, it will have with probability p a squared Mahalanobis distance equal or smaller than . The set of points with forms an ellipsoid.
  • In a naive solution one can use a Monte Carlo approach to sample the multivariate normal distribution and compute the quantile based on the Mahalanobis distances of the elements of the sample
  • However, this Monte Carlo approach is rather computationally inefficient, especially if quantiles have to be computed very often
  • One can show that the squared Mahalanobis distance of a Gaussian distribution is actually Chi-Square distributed.

Empirical Results suggesting that the Mahalanobis Distance is Chi-Square distributed

In a Quantile-Quantile Plot one can see that quantiles of the Mahalanobis distance of a sample drawn from a Gaussian distribution is very similar to the corresponding quantiles computed on the Chi-Square distribution. The following R-script shows this:

library(Matrix)
library(MASS)
library(ggplot2)
DIM = 10
nSample = 1000

Posdef <- function (n, ev = runif(n, 0, 1))
{
  Z <- matrix(ncol=n, rnorm(n^2))
  decomp <- qr(Z)
  Q <- qr.Q(decomp)
  R <- qr.R(decomp)
  d <- diag(R)
  ph <- d / abs(d)
  O <- Q %*% diag(ph)
  Z <- t(O) %*% diag(ev) %*% O
  return(Z)
}

Sigma = Posdef(DIM)
muhat = rnorm(DIM)


sample <- mvrnorm(n=nSample, mu = muhat, Sigma = Sigma)
C <- .5*log(det(2*pi*Sigma))
mahaDist2 <- mahalanobis(x=sample, center=muhat,cov=Sigma)

#
# Interestingly, the Mahalanobis distance of samples follows a Chi-Square distribution
# with d degrees of freedom
#
pps <- (1:100)/(100+1)
qq1 <- sapply(X = pps, FUN = function(x) {quantile(mahaDist2, probs = x) })
qq2 <-  sapply(X = pps, FUN = qchisq, df=ncol(Sigma))

dat <- data.frame(qEmp= qq1, qChiSq=qq2)
p <- ggplot(data = dat) + geom_point(aes(x=qEmp, y=qChiSq)) +
  xlab("Sample quantile") +
  ylab("Chi-Squared Quantile") +
  geom_abline(slope=1)
plot(p)

Picture description

The squared Mahalanobis Distance follows a Chi-Square Distribution: More formal Derivation

The Mahalanobis distance between two points and is defined as

Thus, the squared Mahalanobis distance of a random vector \matr X and the center \vec \mu of a multivariate Gaussian distribution is defined as:

where is a covariance matrix and is the mean vector. In order to achieve a different representation of one can first perform an eigenvalue decomposition on which is (with Eq. and assuming orthonormal eigenvectors):

With Eq. we get:

where is the th eigenvector of the corresponding eigenvalue . Plugging \eqref{eq:SigmaInverseAsSum} back into \eqref{eq:sqMahalanobis} results in:

With Eq. \eqref{eq:multOfTwoSkalars} one gets:

where is a new random variable based on an affine linear transform of the random vector . According to Eq. \eqref{eq:AffineLinearTransformMean} , we have . If we set then we get . Note that is now a random Variable drawn from a univariate normal distribution , where, according to \eqref{eq:AffineLinearTransformCovariance}:

If we insert

into Eq. \eqref{eq:smallSigma}, we get:

Since all eigenvectors are pairwise orthonormal the dotted products and will be zero for . Only for the case we get:

since the the norm of a orthonormal eigenvector is equal to 1. The squared Mahalanobis distance can be expressed as:

where

Now the Chi-square distribution with degrees of freedom is exactly defined as being the distribution of a variable which is the sum of the squares of random variables being standard normally distributed. Hence, is Chi-square distributed with degrees of freedom.

Derivation based on the Whitening Property of the Mahalanobis Distance

Since the inverse of the covariance matrix is also a symmetric matrix, its squareroot can be found – based on Eq. \eqref{eq:sqrtSymMatrix} – to be a symmetric matrix . In this case we can write the squared Mahalanobis distance as

The multiplication , with and is typically reffered to as a whitening transform, where in this case is the so called Mahalanobis (or ZCA) whitening matrix. has zero mean, since . Due to the (linear) whitening transform the new covariance matrix is the identity matrix , as shown in the following (using the property in Eq. \eqref{eq:AffineLinearTransformCovariance}):

Hence, all elements in the random vector are random variables drawn from independent normal distributions , which leads us to the same conclusion as before, that is Chi-square distributed with degrees of freedom.

Markus Thill

Markus Thill
I studied computer engineering (B.Sc.) and Automation & IT (M.Eng.). Generally, I am interested in machine learning (ML) approaches (in the broadest sense), but particularly in the fields of time series analysis, anomaly detection, Reinforcement Learning (e.g. for board games), Deep Learning (DL) and incremental (on-line) learning procedures.

Deriving a Closed-Form Solution of the Fibonacci Sequence using the Z-Transform

The Fibonacci sequence might be one of the most famous sequences in the field of mathmatics and computer science. Already high school stu...… Continue reading