Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation of the F0 Contour for Speech Synthesis (bibtex)
by Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez
Abstract:
Expressive speech is one of the latest concerns of text-to-speech systems. Due to the subjectivity of expression and emotion realisation in speech, humans cannot objectively determine if one system is more expressive than the other. Most of the text-to-speech systems have a rather flat intonation and do not provide the option of changing the output speech. We therefore present an interactive intonation optimisation method based on the pitch contour parameterisation and evolution strategies. The Discrete Cosine Transform (DCT) is applied to the phrase level pitch contour. Then, the genome is encoded as a vector that contains 7 most significant DCT coefficients. Based on this initial individual, new speech samples are obtained using an interactive Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm. We evaluate a series of parameters involved in the process, such as the initial standard deviation, population size, the dynamic expansion of the pitch over the generations and the naturalness and expressivity of the resulted individuals. The results have been evaluated on a Romanian parametric-based speech synthesiser and provide the guidelines for the setup of an interactive optimisation system, in which the users can subjectively select the individual which best suits their expectations with minimum amount of fatigue.
Reference:
Adriana Stan, Florin-Claudiu Pop, Marcel Cremene, Mircea Giurgiu, Denis Pallez, "Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation of the F0 Contour for Speech Synthesis", In Proceedings of the 5th Workshop on Nature Inspired Cooperative Strategies for Optimisation, Springer, vol. 387, pp. 57-71, 2011.
Bibtex Entry:
@inproceedings{NICSO2011,
  author = {Adriana Stan and Florin-Claudiu Pop and Marcel Cremene and 
                    Mircea Giurgiu and Denis Pallez},
  title ={{Interactive Intonation Optimisation Using CMA-ES and DCT Parametrisation 
                    of the F0 Contour for Speech Synthesis}},
  year = 2011,
  abstract = {Expressive speech is one of the latest concerns of text-to-speech systems. 
              Due to the subjectivity of expression and emotion realisation in speech, 
              humans cannot objectively determine if one system is more expressive than 
              the other. Most of the text-to-speech systems have a rather flat intonation 
              and do not provide the option of changing the output speech. We therefore present 
              an interactive intonation optimisation method based on the pitch contour 
              parameterisation and evolution strategies. The Discrete Cosine Transform 
              (DCT) is applied to the phrase level pitch contour. Then, the genome is encoded 
              as a vector that contains 7 most significant DCT coefficients. Based on this 
              initial individual, new speech samples 
              are obtained using an interactive Covariance Matrix Adaptation Evolution Strategy 
              (CMA-ES) algorithm. We evaluate a series of parameters involved in the process, such 
              as the initial standard deviation, population size, the dynamic expansion of the pitch 
              over the generations and the naturalness and expressivity of the resulted individuals. 
              The results have been evaluated on a Romanian parametric-based speech synthesiser and 
              provide the guidelines for the setup of an interactive optimisation system, in which the 
              users can subjectively select the individual which best suits their expectations 
              with minimum amount of fatigue.},
  booktitle = {Proceedings of the 5th Workshop on Nature Inspired 
                    Cooperative Strategies for Optimisation},
  publisher = {Springer},
  pages = {57-71},
  url = {papers/2011_NICSO.pdf},
  volume = 387,
  doi = {10.1007/978-3-642-24094-2_4},
  series = {Studies in Computational Intelligence}
  }
Powered by bibtexbrowser