본문 바로가기

728x90

ComputerScience

(329)
Speech Recognition - End to End Models for Speech Recognition Before E2E ASR ASR's main purpose is to translate audio into text The overall process of asr can be expressed like above. From speaker's audio signal, we get probalility of the word(AM), and from the word we calculate wheter the word is plausible in the specific sentence(LM) Based on that probability we finally can decode signal into text. Before E2E ASR we had a lot of modules in the full proce..
Speech Recognition - MFCC The most important thing in deep learning is data. We are handling speech so first, we have to know about how to extract feature from audio. By extracting feature from audio, we can represent audio into vector and it means that a model is ready to learn. 1. MFCC (Mel-Spectogram) Simply say, MFCC is transforming audio data into feature vector. By using librosa python library we can simply extract..
Deep Learning - 3.1 Multilayer perceptrons we will learn how to incorporate nonlinearities to build expressive multilayer neural network architectures. MLP adds one or multiple fully-connected hidden layers between the output and input layers and transforms the output of the hidden layer via an activation function. Hidden Layers Activation Functions 1 Hidden Layers MLPs can capture complex interactions among our inputs via their hidden n..
Deep Learning - 2.7 Concise Implementation of Softmax Regression This time we will use high-level-apis of deep learning framework to implement softmax regression. first load mnist-fashion data import torch from torch import nn import torchvision from torchvision import transforms from torch.utils import data import matplotlib.pyplot as plt import numpy # `ToTensor` converts the image data from PIL type to 32-bit floating point # tensors. It divides all number..
Deep Learning - 2.6 Implementation of Softmax Regression from Scratch Now we will implement softmax regression from scratch Initialize Model parameters Defining the softmax operation Defining the Model Defining the loss funciton 1. Initialize Model Parameters Each example in the raw dataset is a 28 × 28 image. We will flaten them into a vector of length 784 and treat each pixel location as just another feature. Because our dataset has 10 classes, our network will ..
Deep Learning - 2.5 The Image Classification Dataset One of the widely used dataset for image classification is the MNIST dataset. Reading the Dataset Reading Minibatch 1. Reading the Dataset download and read the Fashion-MNIST dataset into memeory. Fashion-MNIST consists of images from 10 categories, each represented by 6000 images in the training dataset and by 1000 in the test dataset. The height and width of each input image are both 28 pixels..
Deep Learning - 2.4 Softmax Regression Regression is the hammer we reach for when we want to answer how much? or how many? questions. In practice, we are more often interested in classification: asking not “how much” but “which one”: Classification Network Architecture Initializing Model Parameters ParameterizationCostofFully-ConnectedLayers Softmax Operation Vectorization for Minibatches Loss Function Softmax and Derivatives Cross-..
Deep Learning - 2.3 Concise Implementation of Linear Regression In this section, we will implement the linear regression model which we've implemented before by using high-level APIs of deep learning frameworks. Generating Dataset Reading the dataset (minibatches) Defining the Model Initializing Model Parameters Defining the Loss Function Defining the Optimization algorithm Training 1. Generating Dataset Generate the same dataset we've made before. 2. Readin..

728x90