Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 263171
Loughborough University

Loughborough University Institutional Repository

Please use this identifier to cite or link to this item: https://dspace.lboro.ac.uk/2134/12099

Title: A modified One-Class-One-Network ANN architecture for dynamic phoneme adaptation
Authors: Haskey, Stephen
Issue Date: 1998
Publisher: © Stephen Haskey
Abstract: As computers begin to pervade aspects of our everyday lives, so the problem of communication from man-to-machine becomes increasingly evident. In recent years, there has been a concerted interest in speech recognition offering a user to communicate freely with a machine. However, this deceptively simple means for exchanging information is in fact extremely complex. A single utterance can contain a wealth of varied information concerning the speaker's gender, age, dialect and mood. Numerous subtle differences such as intonation, rhythm and stress further add to the complexity, increasing the variability between inter- and intra-speaker utterances. These differences pose an enormous problem, especially for a multi-user system since it is impractical to train for every variation of every utterance from every speaker. Consequently adaptation is of great importance, allowing a system with limited knowledge to dynamically adapt towards a new speakers characteristics. A new modified artificial neural network (ANN) was proposed incorporating One-Class-OneNetwork (OCON) subnet architectures connected via a common front-end adaptation layer. Using vowel phonemes from the TIMIT speech database, the adaptation was concentrated on neurons within the front-end layer, resulting in only information common to all classes, primarily speaker characteristics, being adapted. In addition, this prevented new utterances from interfering with phoneme unique information in the corresponding OCON subnets. Hence a more efficient adaptation procedure was created which, after adaptation towards a single class, also aided in the recognition of the remaining classes within the network. Compared with a conventional multi-layer perceptron network, results for inter- and intraspeaker adaptation showed an equally marked improvement for the recognition of adapted phonemes during both full neuron and front-layer neuron adaptation within the new modified architecture. When testing the effects of adaptation on the remaining unadapted vowel phonemes, the modified architecture (allowing only the neurons in the front-end layer to adapt) yielded better results than the modified architecture allowing full neuron adaptation. These results highlighted the storing of speaker information, common to all classes, in the front-end layer allowing efficient inter- and intra-speaker dynamic adaptation.
Description: A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.
URI: https://dspace.lboro.ac.uk/2134/12099
Appears in Collections:PhD Theses (Mechanical, Electrical and Manufacturing Engineering)

Files associated with this item:

File Description SizeFormat
Thesis-1998-Haskey.pdf3.65 MBAdobe PDFView/Open
Form-1998-Haskey.pdf44.15 kBAdobe PDFView/Open


SFX Query

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.