Loughborough University
Browse
Thesis-1999-Brammall.pdf (15.05 MB)

An investigation into the use of linguistic context in cursive script recognition by computer

Download (15.05 MB)
thesis
posted on 2010-11-16, 10:56 authored by Neil H. Brammall
The automatic recognition of hand-written text has been a goal for over thirty five years. The highly ambiguous nature of cursive writing (with high variability between not only different writers, but even between different samples from the same writer), means that systems based only on visual information are prone to errors. It is suggested that the application of linguistic knowledge to the recognition task may improve recognition accuracy. If a low-level (pattern recognition based) recogniser produces a candidate lattice (i.e. a directed graph giving a number of alternatives at each word position in a sentence), then linguistic knowledge can be used to find the 'best' path through the lattice. There are many forms of linguistic knowledge that may be used to this end. This thesis looks specifically at the use of collocation as a source of linguistic knowledge. Collocation describes the statistical tendency of certain words to co-occur in a language, within a defined range. It is suggested that this tendency may be exploited to aid automatic text recognition. The construction and use of a post-processing system incorporating collocational knowledge is described, as are a number of experiments designed to test the effectiveness of collocation as an aid to text recognition. The results of these experiments suggest that collocational statistics may be a useful form of knowledge for this application and that further research may produce a system of real practical use.

History

School

  • Science

Department

  • Computer Science

Publisher

© Neil Howard Brammall

Publication date

1999

Notes

A Doctoral Thesis. Submitted in partial fulfillment of the requirements for the award of Doctor of Philosophy of Loughborough University.

EThOS Persistent ID

uk.bl.ethos.313328

Language

  • en

Usage metrics

    Computer Science Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC