Important Terminologies for Classification in Machine Learning

Classification is a process of categorizing a given set of data into classes, It can be performed on both structured or unstructured data. The process starts with predicting the class of given data points. The classes are often referred to as target, label, or categories.
Hence, in this blog, I will be throwing some light on various terminologies which are needed before performing classification using a machine learning model.
Following are the concepts that are required to understand:
- Log loss
- Confusion matrix
- Precision matrix and recall matrix
- One hot encoding
- Response encoding
- Laplace smoothing
- SGD Classifier
- Calibrated classifier
Log Loss
Log Loss is the most important classification metric based on probabilities. It’s hard to interpret raw log-loss values, but log-loss is still a good metric for comparing models. For any given problem, a lower log loss value means better predictions.
- Mathematical Interpretation:


Hence further solving the above table, the results will be:

Log loss is an important parameter to compare two machine learning model accuracies.
Confusion Matrix
A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known.

Precision and Recall Matrix
Precision tells us how many of the correctly predicted cases actually turned out to be positive.

Recall tells us how many of the actual positive cases we were able to predict correctly with our model.

One Hot Encoding
The input to this transformer should be a matrix of integers, denoting the values taken on by categorical (discrete) features. The output will be a sparse matrix where each column corresponds to one possible value of one feature.

The above table when encoded with 1s and 0s becomes:

Response Encoding
When the dataset is too large, using one-hot encoding will create a huge number of columns and hence increases computational time.
In such cases, we use response encoding where we get the probability of occurrence as the number of times a feature corresponds to a class.
The higher the occurrence of the data point having the same class, the higher the probability, and hence that class will be predicted.

Laplace Smoothing
Laplace smoothing is a smoothing technique that helps tackle the problem of zero probability in the Naïve Bayes machine learning algorithm.

SGD Classifier
Stochastic Gradient Classifier is a linear classifier that updates the parameters so as the model prediction is decreased with each computation cycle. It further calculates the log loss and again tries to change the parameters to obtain the best model accuracy.

Calibrated Classifier
A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A common example of a sigmoid function is the logistic function

The above terms and technologies were explained in the shortest and most efficient way possible. If you want to explore them in-depth, please feel free to surf through youtube tutorials and other websites :)
