Home / Blog / Machine Learning / Overfitting and Underfitting

Overfitting and Underfitting

July 14, 2023
56

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

What is Overfitting?

An overfitting scenario is when a model performs very well on training data but poorly on test data. The noise that the machine learning model learns along with the patterns will have a detrimental impact on the model's performance on test data. When using nonlinear models with a nonlinear decision boundary, the overfitting issue typically arises. In SVM, a decision boundary could be a hyperplane or a linearly separable line.

Overfitting

The pattern is nonlinear in this instance, as is evident. Results from the model cannot be generalised to new data.

Non-linear models, such as decision trees, may frequently overfit the decision boundary that they produce. A high variance problem is another name for this. If we use target shooting as an example, if there is substantial volatility, it would be comparable to having an unstable target. Overfitting results in a big Val/test error and a relatively tiny train error.

Reasons for Overfitting:

If Data preprocessing is not done properly and contains a noise factor in it.
If the model overfits It also said that it has a high variance.
If model has to learn many parameters
The model can memorise the pattern but can not learn patterns

To confront overfitting :

Using K Fold Cross-Validation:

The ideal preventative approach to address overfitting is cross-validation. The entire dataset is split into k sets, each of roughly similar size. The algorithm will train the data on the k-1 sets using the first set as test data. The calculation of the test error.

Learn the core concepts of Data Science Course video on YouTube:

In the second iteration, the residual k-1 sets are used as train data for calculating test error, and the second set is chosen as the test set.

Once all k sets have been processed, the procedure repeats.

The method for K=5 is illustrated in the image below.

In any case, we may modify folds to find the ideal k to address overfitting.

Overfitting

We can tune folds, either way, to select the best k to solve overfitting.

Using sufficient training data:
This will not always work if the model is not so complex. We can try a less powerful model with fewer parameters. Data augmentation will help to solve this sometimes.
Quantity of features:
Overfitting may be avoided by doing feature engineering and feature selection.

We add additional characteristics to the model in an effort to increase its accuracy, but doing so may overcomplicate it and cause overfitting.
Regularisation:
To make the model as basic as feasible, regularisation maintains the parameter values as little as possible. When compared to initial simple models, strong regularisation would perform better. In order to prevent the model from overlearning the patterns in the data, the regularisation approach helps to decrease the parameters. The tuning parameter is what aids in getting the proper fit. distinct machine learning algorithms have distinct hyperparameters. For instance, neural network dropout, pruning strategy, decision tree ccp_alpha, maximum tree depth, regression using L1/L2 norms, etc.

Please click the following link to learn about pruning techniques.

https://360digitmg.com/decision-trees-and-its-algorithms
Adopting ensemble techniques
Like boosting, bagging Random forest can be used to solve variance problems.

What is Underfitting?

Underfitting occurs when a model does not learn the patterns on training data well enough to generalise to unknown data. The link between input and output variables is inaccurately learned by the model. When the model is overly simplistic or requires additional training time, input characteristics, etc., this happens. Both train and Val/test error are significant.

The model generates forecasts that are accurate but initially off. When compared to overfitting, underfitting is not a major problem because it can be readily fixed. The algorithms' principles can be applied to smaller data sets, which can lead to inaccurate predictions.

Overfitting

Reasons for Underfitting :

The model has high bias
The training data is not sufficient to learn the patterns .
The model is too simple.
Data cleansing should be performed properly so that it can capture the relation between variables .
Maybe we can say the noise factor is also one of the reasons for underfitting..

To confront Underfitting:

Adding more features to the data:

By including more inputs to our data, we may make the model more complicated and better reflect the relationship between the variables. Building polynomial models starting with 2 degrees, 3 degrees, etc. will allow us to try it out.

Underfitting can be fixed by adding inputs in a sequential manner. For instance, increasing the number of hidden neurons in a neural network or the number of trees in a random forest would increase complexity to the model and improve training outcomes.

Increase duration of training :

We are stopping the training soon by not allowing the algorithm to learn the patterns completely. It is very important to maintain the right steps while training otherwise it may run into overfitting. We can increase the number of epochs in neural networks.

Regularisation :

By imposing a penalty on the input parameters with the greater coefficients, regularisation aids in lowering the variance associated with a model. A model's noise and outliers may be reduced using a variety of methods, including L1/L2 regularisation and other techniques. The model will not be able to recognise the prevailing trend if the data dimensions are too stable, which results in underfitting. Reducing the regularisation level improves complexity and variance incorporated into the model, enabling effective model training.

What is the best fit in Machine Learning?

When the model predicts with zero error it is the best fit scenario. From the below charts we can infer that the model initially fails to capture the relationship between x and y. Then we added features to improve the pattern learning. To reduce underfitting we keep on adding features that will eventually make your model more complex resulting in overfitting. Click here to learn Data Science Course in Hyderabad

The alternative possibility is that when learning time grows over time as a result of additional inputs, error on training data and test data will also likely decrease. The model will become overfitted if this persists and training the data takes more time.

Overfitting

So choosing the right set of features, the right amount of training, right regularisation penalty terms will help us in achieving the RIGHT fit or the best fit.

Overfitting

Conclusion:

We try to find the ideal ratio of bias to variance for every model. This only makes sure that we record the key patterns in our model while disregarding the noise. A bias-variance tradeoff can be used to describe this. Our model's error is lowered and maintained as low as feasible with its assistance.

A model that has been optimised will be sensitive to the patterns in our data while also being able to generalise to new data. This should have a modest bias and variance to avoid overfitting and underfitting. Therefore, achieving minimal bias and low variance is our goal.