How to discover patterns that generalize is the fundamental problem of machine learning.
Our predictions will only be useful if our model has truly discovered a general pattern.
When working with finite samples, we run the risk that we might discover apparent associations that turn out not to hold up when we collect more data.
The phenomenon of fitting our training data more closely than we fit the underlying distribution is called overfitting, and the techniques used to combat overfitting are called regularization.
1. Training Error and Generalization Error
The training error is the error of our model as calculated on the training dataset,
while generalization error is the expectation of our modelʼs error were we to apply it to an infinite stream of additional data examples drawn from the same underlying data distribution as our original sample.
Actually it is impossible to calculate exact generalization error, we estimate the error with independent test set which are selected randomly.
In standard supervised learning, we assume that both the training data and the test data are drawn independently from identical distribution.
Working well on training dataset doesn't always guarantee that the model works generally.
- when the number of tunable parameters is large, models tend to be more susceptible to overfitting.
- when weights can take a wider range of values (complex, huge model), models can be more susceptable to overfitting.
- Even if your model is simple, with small training dataset, it can be more susceptable to overfitting. Overfitting a huge dataset requires an extremely flexible model. In general, more data never hurt
2. Underfitting and Overfitting
when both of training and validation error are substantial but there is a little gap, that could mean that our model is too simple to capture the pattern.
When our training error is significantly lower than our validation error, it means that we fronted severe overfitting.
'ComputerScience > Machine Learning' 카테고리의 다른 글
Deep Learning - 3.5 Weight Decay (0) | 2022.10.06 |
---|---|
Deep Learning - 3.2~3.3 Implementation of Multilayer Perceptrons (0) | 2022.09.20 |
Deep Learning - 3.1 Multilayer perceptrons (0) | 2022.09.15 |
Deep Learning - 2.7 Concise Implementation of Softmax Regression (0) | 2022.09.01 |
Deep Learning - 2.6 Implementation of Softmax Regression from Scratch (0) | 2022.08.23 |