본문 바로가기

ComputerScience/Machine Learning

Deep Learning - 2.6 Implementation of Softmax Regression from Scratch

728x90

Now we will implement softmax regression from scratch

  • Initialize Model parameters
  • Defining the softmax operation
  • Defining the Model
  • Defining the loss funciton

1. Initialize Model Parameters

Each example in the raw dataset is a 28 × 28 image. We will flaten them into a vector of length 784 and treat each pixel location as just another feature.

Because our dataset has 10 classes, our network will have an output dimension of 10.

So out weights will constitute a 784 × 10 matrix and the biases will constitute a 1 × 10 row vector.

initialize our weights W with Gaussian noise and our biases to take the initial value 0.

 

2. Defining Softmax Operation

 

3. Defining the model

4. Defining the loss function

Instead of using for loop we can make cross-entrophy loss more efficiently.

5. Classification Accuracy

The result is a tensor containing entries of 0 (false) and 1 (true). Taking the sum yields the number of correct predictions.

y_hat.argmax(axis=1) 's result is [2,2]

and y is [0,2]

so the accuracy is 0.5

6. Traning

First, we define a function to train for one epoch.

def train_epoch_ch3(net, train_iter, loss, updater): 
    """The training loop defined in Chapter 3."""
    # Set the model to training mode

if isinstance(net, torch.nn.Module):
    net.train()
# Sum of training loss, sum of training accuracy, no. of examples
metric = Accumulator(3)
for X, y in train_iter:
    # Compute gradients and update parameters
    y_hat = net(X)
    l = loss(y_hat, y)
    if isinstance(updater, torch.optim.Optimizer):
        # Using PyTorch in-built optimizer & loss criterion
        updater.zero_grad()
        l.mean().backward()
        updater.step()
    else:
        # Using custom built optimizer & loss criterion
        l.sum().backward()
        updater(X.shape[0])
    metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())
# Return training loss and training accuracy
return metric[0] / metric[2], metric[1] / metric[2]

 

The following training function then trains a model net on a training dataset accessed via train_iter for multiple epochs, which is specified by num_epochs.

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save
    """Train a model (defined in Chapter 3)."""
    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))
    train_loss, train_acc = train_metrics
    assert train_loss < 0.5, train_loss
    assert train_acc <= 1 and train_acc > 0.7, train_acc
    assert test_acc <= 1 and test_acc > 0.7, test_acc

we use the minibatch stochastic gradient descent to optimize the loss function of the model with a learning rate 0.1.

lr = 0.1

def updater(batch_size):
    return d2l.sgd([W, b], lr, batch_size)

Now we train the model with 10 epochs.

num_epochs = 10
train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

7. Prediction

Now that training is complete, our model is ready to classify some images.

def predict_ch3(net, test_iter, n=6):  #@save
    """Predict labels (defined in Chapter 3)."""
    for X, y in test_iter:
break
    trues = d2l.get_fashion_mnist_labels(y)
    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))
    titles = [true +'\n' + pred for true, pred in zip(trues, preds)]
    d2l.show_images(
        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])
predict_ch3(net, test_iter)
728x90
반응형