본문 바로가기

728x90

Paper Review

(14)
[Paper Review] Denoising Diffusion Probabilistic Models Denoising Diffusion Probabilistic ModelsWe present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational boundarxiv.org0 Abstract
[Paper Review] On decoder-only architecture for speech-to-text and large language model integration On decoder-only architecture for speech-to-text and large language model integrationLarge language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explorearxiv.org0. Abstract1.
[Paper Review] Investigating Decoder-only Large Language Models for Speech-to-text Translation Investigating Decoder-only Large Language Models for Speech-to-text TranslationLarge language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMsarxiv.org0 Abstractintegrate decoder-only LLMs to the task of sp..
[Paper Review] Recent Advances in Speech Language Models: A Survey Recent Advances in Speech Language Models: A SurveyLarge Language Models (LLMs) have recently garnered significant attention, primarily for their capabilities in text-based interactions. However, natural human interaction often relies on speech, necessitating a shift towards voice-based models. A straightfarxiv.org0. Abstractnatural human interaction often relies on speech, necessitating a shift..
[Paper Review] Feature Unlearning for Pre-trained GANs and VAEs Feature Unlearning for Pre-trained GANs and VAEsWe tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from farxiv.org0. AbstractFeature unlearning is simply making a model to exclude the production of s..
[Paper Review] Zipformer: A faster and better encoder for automatic speech recognition Zipformer: A faster and better encoder for automatic speech recognitionThe Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-parxiv.org0. AbstractThe Conformer has become the most popular encoder m..
[Paper Review] Robust Speech Recognition via Large-Scale Weak Supervision Robust Speech Recognition via Large-Scale Weak SupervisionWe study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standardarxiv.org0. AbstractSuggest large-scale and weakly-supervised speech processing mode..
[Paper Review] Conformer: Convolution-augmented Transformer for Speech Recognition Conformer: Convolution-augmented Transformer for Speech RecognitionRecently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interacarxiv.org0. AbstractTransformer models are good at capturing content-based ..

728x90