728x90
This time we'll make deeper neural network which classify MNIST fashion dataset.
For the same classification problem, the implementation of an MLP is the same as that of softmax regression except for additional hidden layers with activation functions.
1. Model
Typically, we choose layer widths in powers of 2, which tend to be computationally efficient because of how memory is allocated and addressed in hardware.
Again, we will represent our parameters with several tensors. Note that for every layer, we must keep track of one weight matrix and one bias vector. As always, we allocate memory for the gra- dients of the loss with respect to these parameters.
net = nn.Sequential(nn.Flatten(),
nn.Linear(784, 256),
nn.ReLU(),
nn.Linear(256, 10))
def init_weights(m):
if type(m) == nn.Linear:
nn.init.normal_(m.weight, std=0.01)
net.apply(init_weights);
batch_size, lr, num_epochs = 256, 0.1, 10
loss = nn.CrossEntropyLoss(reduction='none')
trainer = torch.optim.SGD(net.parameters(), lr=lr)
Recall that nn.CrossEntropyLoss calculates softmax and cross-entropy loss
728x90
반응형
'ComputerScience > Machine Learning' 카테고리의 다른 글
Deep Learning - 3.5 Weight Decay (0) | 2022.10.06 |
---|---|
Deep Learning - 3.4 Model Selection, Underfitting, Overfitting (0) | 2022.09.27 |
Deep Learning - 3.1 Multilayer perceptrons (0) | 2022.09.15 |
Deep Learning - 2.7 Concise Implementation of Softmax Regression (0) | 2022.09.01 |
Deep Learning - 2.6 Implementation of Softmax Regression from Scratch (0) | 2022.08.23 |