Supervised Learning

Supervised learning is defined as a learning strategy where the data is matched with the labels during the training process and so the training process is supervised via the labels it has been provided.

Linear Regression

Linear regression is a statistical method used to find a relationship between a dependent variable and one or more independent variables by fitting a straight line through the data. This technique is mainly used in Regression tasks.

Logit Model

Although it is not designed for Classification tasks (one can even argue if it made sense) Linear Regression can be adapted for it. For instance, in binary classification for instance, the target variables could be transformed into 0s and 1s and then a threshold could be applied to assign regression to classes.

If Linear Regression model was to modified by applying Sigmoid function to output a probability between 0 and 1.

$$ S(x)= \frac{1}{(1 + e^{-x}) } $$

Logistic Regression

Support Vector Machines (SVMs)

Unsupervised Learning

This is the learning where there are no labels for a given data. Learning is based on sole understanding of

Regression

Prediction of a continues target variable. This is a prediction method for estimating a value for a given problem.

Classification

Prediction of a binary or multi modal target evaluation. In essence, classification classifies the target into a label.

Loss function

Cost function is used for measuring how inaccurate the model’s predictions are. In a nutshell, it is a mathematical tool designed for understanding how error-prone the predictions were compared to the ground truth.

Cost Function

Loss function measures the error for a single data point while a cost function is the average of the loss function values over the entire dataset.

Gradient Descent

The goal here is to find the minimum of our model’s loss function by iteratively getting better and better approximation of it.

Overfitting

Sometimes, training of a model may lead to a function that can explain a specific situation from the training set incredibly well but it will be a stranger to an outsider data. When an outsider data is to be evaluated, trained model will not predict properly because it was “over-trained”. A good model is generalized enough to answer questions from an unseen data

Underfitting

Here our model would be “too general” meaning that the training was not enough to teach anything leading for results that are not capturing the underlying trends in the data.

Regularization

This is a concept that introduces penalty variable into the loss function. The penalty is adjusted for large beta coefficients in the cost function. Regularization is used to prevent overfitting by adding a penalty term to the loss function based on model weights.

L1 Regularization (Lasso)

Adds the sum of absolute values of the weights to the loss: $$ L = L_0 + \lambda \sum_i |w_i| $$

Encourages sparsity — some weights become zero.
Effectively performs feature selection.

L2 Regularization (Ridge)

Adds the sum of squared values of the weights to the loss:

$$ L = L_0 + \lambda \sum_i w_i^2 $$

Encourages small weights — reduces model complexity.
Distributes shrinkage across all parameters.

Elastic Net (Combined)

Uses both penalties to balance sparsity and weight decay:

$$ L = L_0 + \lambda_1 \sum_i |w_i| + \lambda_2 \sum_i w_i^2 $$