Free Reads

Sign in to view your remaining parses.
Tag Filter
Text-to-Speech Synthesis
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Published:6/13/2023
Text-to-Speech SynthesisStyle DiffusionAdversarial TrainingLarge Speech Language ModelsLJSpeech Dataset
The paper presents StyleTTS 2, a TTS model utilizing style diffusion and adversarial training with large speech language models, achieving humanlevel synthesis and surpassing human recordings on multiple datasets.
02
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Published:1/5/2023
Neural Codec Language ModelsText-to-Speech SynthesisConditional Language ModelingZero-Shot Speech SynthesisHigh-Quality Personalized Speech Synthesis
The paper presents VALLE, a novel TexttoSpeech method using a neural codec language model. It reformulates TTS as conditional language modeling, achieving highquality personalized speech synthesis with just 3 seconds of an unseen speaker's recording, and significantly improvi
02
Tacotron: Towards End-to-End Speech Synthesis
Published:3/30/2017
End-to-End Speech Synthesis ModelTacotron ModelSequence-to-Sequence LearningText-to-Speech SynthesisGenerative Models in NLP
Tacotron is an endtoend texttospeech model that synthesizes speech directly from characters, simplifying complex traditional TTS systems. Trained from scratch, it scores 3.82 in mean opinion, outperforming existing systems in naturalness and offering faster generation speeds.
01
WaveNet: A Generative Model for Raw Audio
Published:9/13/2016
Audio Generation ModelWaveNet ArchitectureText-to-Speech SynthesisAutoregressive ModelingMusic Generation
WaveNet is introduced as a deep neural network for raw audio generation, featuring probabilistic and autoregressive properties. It excels in texttospeech tasks, surpassing existing systems in naturalness, and shows high realism in music generation while also achieving promising
03