Online and Batch-Incremental Estimation of Covariance Matrices and Means in Python

Estimating the mean and covariance matrix of a dataset is a cornerstone of multivariate statistics and machine learning. While batch (offline) methods are straightforward when all data is available at once, many modern applications require online or streaming estimation — where data arrives sequentially, potentially at very high rates, and storing all past samples is infeasible.

In this Jupyter Notebook below, we explore fully online and batch-incremental estimators for the mean and covariance matrix, including their inverses. We look at how these algorithms work, why they are useful, and how they can adapt to non-stationary data through a forgetting factor. Importantly, these approaches allow us to update estimates efficiently without recomputing matrix inverses from scratch.

The goal is to provide both the mathematical background and practical insights for applying online covariance and mean estimation in real-world scenarios such as anomaly detection, adaptive systems, and streaming analytics.

Related posts: The covariance matrix estimated here is a key ingredient for the Mahalanobis distance — which connects to the Chi-square distribution for anomaly detection thresholds. For Python implementations and benchmarks, see Implementing the Mahalanobis Distance in Python.

Enjoy Reading This Article?