Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 263171
Loughborough University

Loughborough University Institutional Repository

Please use this identifier to cite or link to this item: https://dspace.lboro.ac.uk/2134/27016

Title: A comparison of model validation techniques for audio-visual speech recognition
Authors: Seong, Thum W.
Ibrahim, M.Z.
Arshad, Nurul W.
Mulvaney, David J.
Keywords: Audio-visual speech recognition
Hidden markov models
HTK toolkit
Holdout validation
Leave-one-out cross validation
Bootstrap validation
Issue Date: 2017
Publisher: © Springer
Citation: SEONG, T.W. ... et al, 2017. A comparison of model validation techniques for audio-visual speech recognition. IN: Kim K., Kim H. and Baek N. (eds). IT Convergence and Security 2017. ICITS 2017. Lecture Notes in Electrical Engineering, 449, pp. 112-119.
Series/Report no.: Lecture Notes in Electrical Engineering;449
Abstract: This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset.
Description: This is a pre-copyedited version of a contribution published in Kim K., Kim H. and Baek N. (eds). IT Convergence and Security 2017. ICITS 2017. published by Springer. The definitive authenticated version is available online via https://doi.org/10.1007/978-981-10-6451-7_14
Sponsor: This work was supported by Universiti Malaysia Pahang and funded by the Ministry of Higher Education Malaysia under FRGS Grant RDU160108.
Version: Accepted for publication
DOI: 10.1007/978-981-10-6451-7_14
URI: https://dspace.lboro.ac.uk/2134/27016
Publisher Link: https://doi.org/10.1007/978-981-10-6451-7_14
ISBN: 9789811064500
ISSN: 1876-1100
Appears in Collections:Conference Papers and Presentations (Mechanical, Electrical and Manufacturing Engineering)

Files associated with this item:

File Description SizeFormat
A Comparison of Model Validation Techniques for Audio-Visual Speech Recognition - final submission.pdfAccepted version162.79 kBAdobe PDFView/Open


SFX Query

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.