# On correlation and the dot product¶

We are going to implement Pearson product-moment correlation.

We have two vectors, $$\mathbf{x}$$ and $$\mathbf{y}$$, each with $$N$$ values, $$\mathbf{x} = x_0, x_1, ..., x_{N-1}$$, $$\mathbf{y} = y_0, y_1, ..., y_{N-1}$$.

The sample mean is:

$\bar{x}=\frac{1}{N}\sum_{i=0}^{N-1} x_i$

In numpy this would just be np.mean(x).

The Pearson product-moment correlation coefficient between two vectors $$\mathbf{x}, \mathbf{y}$$ is defined as:

$r_{xy} =\frac{\sum ^{N-1} _{i=0}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum ^{N-1} _{i=0}(x_i - \bar{x})^2} \sqrt{\sum ^{N-1} _{i=0}(y_i - \bar{y})^2}}$

$$(x_i - \bar{x})$$ is the vector $$\mathbf{x}$$ after it has been mean-centered (it has a sample mean of zero).