by Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez
Abstract:
Expressive speech is one of the latest concerns of text-to-speech systems. Due to the subjectivity of expression and emotion realisation in speech, humans cannot objectively determine if one system is more expressive than the other. Most of the text-to-speech systems have a rather flat intonation and do not provide the option of changing the output speech. We therefore present an interactive intonation optimisation method based on the pitch contour parameterisation and evolution strategies. The Discrete Cosine Transform (DCT) is applied to the phrase level pitch contour. Then, the genome is encoded as a vector that contains 7 most significant DCT coefficients. Based on this initial individual, new speech samples are obtained using an interactive Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm. We evaluate a series of parameters involved in the process, such as the initial standard deviation, population size, the dynamic expansion of the pitch over the generations and the naturalness and expressivity of the resulted individuals. The results have been evaluated on a Romanian parametric-based speech synthesiser and provide the guidelines for the setup of an interactive optimisation system, in which the users can subjectively select the individual which best suits their expectations with minimum amount of fatigue.
Reference:
Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez, "Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation of the F0 Contour for Speech Synthesis", In Proceedings of the 5th Workshop on Nature Inspired Cooperative Strategies for Optimisation, Springer, vol. 387, pp. 57-71, 2011.
Bibtex Entry:
@inproceedings{NICSO2011,
author = {Adriana Stan and Florin-Claudiu Pop and Marcel Cremene and
Mircea Giurgiu and Denis Pallez},
title ={{Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation
of the F0 Contour for Speech Synthesis}},
year = 2011,
abstract = {Expressive speech is one of the latest concerns of text-to-speech systems.
Due to the subjectivity of expression and emotion realisation in speech,
humans cannot objectively determine if one system is more expressive than
the other. Most of the text-to-speech systems have a rather flat intonation
and do not provide the option of changing the output speech. We therefore present
an interactive intonation optimisation method based on the pitch contour
parameterisation and evolution strategies. The Discrete Cosine Transform
(DCT) is applied to the phrase level pitch contour. Then, the genome is encoded
as a vector that contains 7 most significant DCT coefficients. Based on this
initial individual, new speech samples
are obtained using an interactive Covariance Matrix Adaptation Evolution Strategy
(CMA-ES) algorithm. We evaluate a series of parameters involved in the process, such
as the initial standard deviation, population size, the dynamic expansion of the pitch
over the generations and the naturalness and expressivity of the resulted individuals.
The results have been evaluated on a Romanian parametric-based speech synthesiser and
provide the guidelines for the setup of an interactive optimisation system, in which the
users can subjectively select the individual which best suits their expectations
with minimum amount of fatigue.},
booktitle = {Proceedings of the 5th Workshop on Nature Inspired
Cooperative Strategies for Optimisation},
publisher = {Springer},
pages = {57-71},
url = {papers/2011_NICSO.pdf},
volume = 387,
doi = {10.1007/978-3-642-24094-2_4},
series = {Studies in Computational Intelligence}
}