Derivation of a Weighted Recursive Linear Least Squares Estimator

\(
\let\vec\mathbf \def\myT{\mathsf{T}} \def\mydelta{\boldsymbol{\delta}} \def\matr#1{\mathbf #1} \)

In this post we derive an incremental version of the weighted least squares estimator, described in a previous blog post.

We start with the original closed form formulation of the weighted least squares estimator:

\begin{align} \boldsymbol{\theta} = \big(\matr X^\myT \matr W \matr X + \lambda \matr I\big)^{-1} \matr X^\myT \matr W \vec y. \end{align}

where is a matrix containing inputs of length as row-vectors, is a diagonal weight matrix, containing a weight for each of the observations, is the -dimensional output vector containing one value for each input vector (we can easily extend or explications to multi-dimensional outputs, where we would instead use a matrix ). The term (regularization factor and identity matrix) is the so called regularizer, which is used to prevent overfitting.

Since we have observations we can also slightly modify our above equation, to later indicate the current iteration:

The dimensions are as follows:

If now a new observation pair arrives, some of the above matrices and vectors change as follows (the others remain unchanged):

where

\begin{align} \ \matr X_{n+1} \in \mathbb{R}^{(n+1) \times k}, \ \vec x_{n+1} \in \mathbb{k}, \ \matr W_{n+1} \in \mathbb{R}^{(n+1) \times (n+1)}, \ w_{n+1} \in \mathbb{R}, \ \vec y_{n+1} \in \mathbb{R}^{n+1}, \ y_{n+1} \in \mathbb{R}. \end{align}

Now let us insert Eq. \eqref{eq:newpoint} into Eq. \eqref{eq:weightedRLS} and see what changes:

where we identified:

with the dimensions

\begin{align} \matr G_{n+1} \in \mathbb{R}^{k \times (n+1)}, \ \matr A_{n+1} \in \mathbb{R}^{k \times k}, \ \vec b_{n+1} \in \mathbb{R}^{k}. \end{align}

Now let us expand equation \eqref{eq:Gnp1}:

where

In the next step, let us evaluate from Eq. \eqref{eq:Ap1}:

Since we have to compute the inverse of , it might be helpful to find an incremental formulation, since the inverse is costly to compute. In this case, the Sherman-Morrison formula can help us:

In our case this leads to:

Then, we expand Eq. \eqref{eq:Bp1}:

Now let us insert the results of \eqref{eq:Ap1inv} and \eqref{eq:Bp1new} into Eq. \eqref{eq:phi} and then simplify the expression:

Note that we used the definition

to make our equation look simpler. Although we did a few rearrangements, it seems like Eq. \eqref{eq:areWeDone} cannot be simplified further. However, with a small trick we can actually find a nicer solution. For this purpose, let us look closer at Eq. \eqref{eq:deltaa} and play with it a little:

Interestingly, we can find the RHS of Eq. \eqref{delta-simple} also in Eq. \eqref{eq:areWeDone}. If we use above relation, we can therefore simplify \eqref{eq:areWeDone} significantly:

This means that the above update rule performs some step in the parameter space, which is given by which again is scaled by the prediction error for the new point . If the prediction error is large, the step taken will also be large. If the prediction error for the new point is then the parameter vector remains unaltered.

Let us summarize our findings in an algorithmic description of the recursive weighted least squares algorithm:

Markus Thill

Markus Thill
I studied computer engineering (B.Sc.) and Automation & IT (M.Eng.). Generally, I am interested in machine learning (ML) approaches (in the broadest sense), but particularly in the fields of time series analysis, anomaly detection, Reinforcement Learning (e.g. for board games), Deep Learning (DL) and incremental (on-line) learning procedures.

Deriving a Closed-Form Solution of the Fibonacci Sequence using the Z-Transform

The Fibonacci sequence might be one of the most famous sequences in the field of mathmatics and computer science. Already high school stu...… Continue reading

Conway's Game of Life

Published on April 13, 2019