A modified One-Class-One-Network ANN architecture for dynamic phoneme adaptation

Haskey, Stephen

Thesis-1998-Haskey.pdf (3.56 MB)

A modified One-Class-One-Network ANN architecture for dynamic phoneme adaptation

thesis

posted on 2013-04-11, 13:14 authored by Stephen Haskey

As computers begin to pervade aspects of our everyday lives, so the problem of communication from man-to-machine becomes increasingly evident. In recent years, there has been a concerted interest in speech recognition offering a user to communicate freely with a machine. However, this deceptively simple means for exchanging information is in fact extremely complex. A single utterance can contain a wealth of varied information concerning the speaker's gender, age, dialect and mood. Numerous subtle differences such as intonation, rhythm and stress further add to the complexity, increasing the variability between inter- and intra-speaker utterances. These differences pose an enormous problem, especially for a multi-user system since it is impractical to train for every variation of every utterance from every speaker. Consequently adaptation is of great importance, allowing a system with limited knowledge to dynamically adapt towards a new speakers characteristics. A new modified artificial neural network (ANN) was proposed incorporating One-Class-OneNetwork (OCON) subnet architectures connected via a common front-end adaptation layer. Using vowel phonemes from the TIMIT speech database, the adaptation was concentrated on neurons within the front-end layer, resulting in only information common to all classes, primarily speaker characteristics, being adapted. In addition, this prevented new utterances from interfering with phoneme unique information in the corresponding OCON subnets. Hence a more efficient adaptation procedure was created which, after adaptation towards a single class, also aided in the recognition of the remaining classes within the network. Compared with a conventional multi-layer perceptron network, results for inter- and intraspeaker adaptation showed an equally marked improvement for the recognition of adapted phonemes during both full neuron and front-layer neuron adaptation within the new modified architecture. When testing the effects of adaptation on the remaining unadapted vowel phonemes, the modified architecture (allowing only the neurons in the front-end layer to adapt) yielded better results than the modified architecture allowing full neuron adaptation. These results highlighted the storing of speaker information, common to all classes, in the front-end layer allowing efficient inter- and intra-speaker dynamic adaptation.

History

School

Mechanical, Electrical and Manufacturing Engineering

Publisher

Publication date

1998

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.

EThOS Persistent ID

uk.bl.ethos.285893

Language

en

Administrator link

https://repository.lboro.ac.uk/account/articles/9535454

Usage metrics

Keywords

untagged Mechanical Engineering not elsewhere classified

Licence

CC BY-NC-ND 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

A modified One-Class-One-Network ANN architecture for dynamic phoneme adaptation

History

School

Publisher

Publication date

Notes

EThOS Persistent ID

Language

Administrator link

Usage metrics

Categories

Keywords

Licence

Exports