Tags: Text-to-Speech Synthesis - Paper Library - SwiftScholar

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Published:6/13/2023

Text-to-Speech SynthesisStyle DiffusionAdversarial TrainingLarge Speech Language ModelsLJSpeech Dataset

The paper presents StyleTTS 2, a TTS model utilizing style diffusion and adversarial training with large speech language models, achieving humanlevel synthesis and surpassing human recordings on multiple datasets.

02

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Published:1/5/2023

Neural Codec Language ModelsText-to-Speech SynthesisConditional Language ModelingZero-Shot Speech SynthesisHigh-Quality Personalized Speech Synthesis

The paper presents VALLE, a novel TexttoSpeech method using a neural codec language model. It reformulates TTS as conditional language modeling, achieving highquality personalized speech synthesis with just 3 seconds of an unseen speaker's recording, and significantly improvi

02

Tacotron: Towards End-to-End Speech Synthesis

Published:3/30/2017

End-to-End Speech Synthesis ModelTacotron ModelSequence-to-Sequence LearningText-to-Speech SynthesisGenerative Models in NLP

Tacotron is an endtoend texttospeech model that synthesizes speech directly from characters, simplifying complex traditional TTS systems. Trained from scratch, it scores 3.82 in mean opinion, outperforming existing systems in naturalness and offering faster generation speeds.

01

WaveNet: A Generative Model for Raw Audio

Published:9/13/2016

Audio Generation ModelWaveNet ArchitectureText-to-Speech SynthesisAutoregressive ModelingMusic Generation

WaveNet is introduced as a deep neural network for raw audio generation, featuring probabilistic and autoregressive properties. It excels in texttospeech tasks, surpassing existing systems in naturalness, and shows high realism in music generation while also achieving promising

03

Free Reads