This document provides an overview of basic statistical tools for data science
Andrew L. Mackey
Statistics and probability are important tools for the analysis of data.
data <- c(1,2,3,4,5) mean(data) sum(data)/length(data)
var(data) sum( (data - mean(data))^2 ) / (length(data) - 1)
sd(data) sqrt( var(data) )
To estimate the coefficients, we can use the least-squares approach:
$$ \begin{align} \hat{\beta} &= (\mathbf{x}^T \mathbf{x})^{-1} \, \mathbf{x}^T \mathbf{y} \\[1em] &= \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon \\[1em] \end{align} $$We can use the following code in R to estimate the coefficients manually.
betahat <- solve( t(x) %*% x ) %*% t(x) %*% y
Once we obtain the \( \beta \) weights for our model (either by deriving them or through Gradient Descent), we can have the model estimate/predict values for given input data:
$$\hat{y} = \mathbf{x} \beta$$ $$h_\mathbf{\beta}(\mathbf{x}) = \hat{y} = \mathbf{x}\beta$$The hypothesis function \( h_\beta(\mathbf{x}) \) (outputs are often denoted as \(\hat{y} \)) accepts some record \(\mathbf{x} = \begin{bmatrix} 1& x_1 & x_2 & ... & x_p \end{bmatrix}\) and multiplies it by the corresponding weights \( \beta = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix}\).
#**************************************************** #* Define a function named "predict" that serves as #* the hypothesis function. #* #* @param w a vector of weights #* @param x a vector of inputs #**************************************************** predict <- function(w,x) { result <- x %*% w # matrix multiplication return(result) } # Example weights (betas) w <- c(100, 0.5, 2.0) # Example input vector x <- c(1, 30, 2) # Element-wise multiplication yhat <- sum(w*x) print(yhat) # Matrix multiplication yhat <- predict(w,x) print(yhat)
Suppose that we had a dataset with two features, \(\mathbf{x}_1\) and \(\mathbf{x}_2\). We will define the following sample data comprised of two features and four records, giving us the resulting dimensions of \( 4 \times 2\):
$$\mathbf{X} = \begin{bmatrix} 10 & 20 \\ 30 & 40 \\ 50 & 60 \\ 70 & 80 \end{bmatrix}$$mydata <- c(10, 20, 30, 40, 50, 60, 70, 80) X <- matrix(data=mydata, nrow = 4, ncol = 2, byrow=TRUE)
Next, we need to add the bias column for our matrix:
$$\mathbf{X} = \begin{bmatrix} 1 & 10 & 20 \\ 1 & 30 & 40 \\ 1 & 50 & 60 \\ 1 & 70 & 80 \end{bmatrix}$$biasweight <- 1 bias <- replicate(n=4, expr=biasweight) X <- cbind(bias, X)
We can now predict all \(n\) records in \(\mathbf{X}\) using matrix multiplication as \(\mathbf{X} \times \mathbf{w}\):
yhats <- predict(w,X)