- Paper Review [Paper Review] Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding Interventional Speech Noise Injection for ASR Generalizable Spoken Language UnderstandingRecently, pre-trained language models (PLMs) have been increasingly adopted in spoken language understanding (SLU). However, automatic speech recognition (ASR) systems frequently produce inaccurate transcriptions, leading to noisy inputs for SLU models, wharxiv.org0 Abstract1. ASR errors are propagated to SL..
- Paper Review [Paper Review] Investigating Decoder-only Large Language Models for Speech-to-text Translation Investigating Decoder-only Large Language Models for Speech-to-text TranslationLarge language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMsarxiv.org0 Abstractintegrate decoder-only LLMs to the task of sp..
- Paper Review [Paper Review] Recent Advances in Speech Language Models: A Survey Recent Advances in Speech Language Models: A SurveyLarge Language Models (LLMs) have recently garnered significant attention, primarily for their capabilities in text-based interactions. However, natural human interaction often relies on speech, necessitating a shift towards voice-based models. A straightfarxiv.org0. Abstractnatural human interaction often relies on speech, necessitating a shift..
- Paper Review [Paper Review] Feature Unlearning for Pre-trained GANs and VAEs Feature Unlearning for Pre-trained GANs and VAEsWe tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from farxiv.org0. AbstractFeature unlearning is simply making a model to exclude the production of s..
- Paper Review [Paper Review] Zipformer: A faster and better encoder for automatic speech recognition Zipformer: A faster and better encoder for automatic speech recognitionThe Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-parxiv.org0. AbstractThe Conformer has become the most popular encoder m..
- Paper Review [Paper Review] Robust Speech Recognition via Large-Scale Weak Supervision Robust Speech Recognition via Large-Scale Weak SupervisionWe study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual and multitask supervision, the resulting models generalize well to standardarxiv.org0. AbstractSuggest large-scale and weakly-supervised speech processing mode..
- Paper Review [Paper Review] Conformer: Convolution-augmented Transformer for Speech Recognition Conformer: Convolution-augmented Transformer for Speech RecognitionRecently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interacarxiv.org0. AbstractTransformer models are good at capturing content-based ..
- Paper Review [Paper Review] Sequence Transduction with Recurrent Neural Networks Sequence Transduction with Recurrent Neural NetworksMany machine learning tasks can be expressed as the transformation---or \emph{transduction}---of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. Onarxiv.org0. AbstractMany machine learning tasks can be expressed as the transformation—or ..