본문 바로가기

ComputerScience/Machine Learning

Deep Learning - 3.5 Weight Decay

728x90

Weight decay (commonly called L2 regularization), might be the most widely-used technique for regularizing parametric machine learning models.

how should the model trade off the standard loss for this new additive penalty? In practice, we characterize this tradeoff via the regularization constant λ, a non- negative hyperparameter that we fit using validation data

given the penalty term alone, our optimization algorithm decays the weight at each step of training.

Because weight decay is ubiquitous in neural network optimization, the deep learning framework makes it especially convenient, integrating weight decay into the optimization algorithm itself for easy use in combination with any loss function

728x90
반응형