On correlation and the dot productΒΆ

We are going to implement Pearson product-moment correlation.

We have two vectors, \(\mathbf{x}\) and \(\mathbf{y}\), each with \(N\) values, \(\mathbf{x} = x_0, x_1, ..., x_{N-1}\), \(\mathbf{y} = y_0, y_1, ..., y_{N-1}\).

The sample mean is:

\[\bar{x}=\frac{1}{N}\sum_{i=0}^{N-1} x_i\]

In numpy this would just be np.mean(x).

The Pearson product-moment correlation coefficient between two vectors \(\mathbf{x}, \mathbf{y}\) is defined as:

\[r_{xy} =\frac{\sum ^{N-1} _{i=0}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^{N-1} _{i=0}(x_i - \bar{x})^2} \sqrt{\sum ^{N-1} _{i=0}(y_i - \bar{y})^2}}\]

\((x_i - \bar{x})\) is the vector \(\mathbf{x}\) after it has been mean-centered (it has a sample mean of zero).