Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 263171
Loughborough University

Loughborough University Institutional Repository

Please use this identifier to cite or link to this item: https://dspace.lboro.ac.uk/2134/24768

Title: Document spanners: From expressive power to decision problems
Authors: Freydenberger, Dominik D.
Holldack, Mario
Keywords: Information extraction
Document spanners
Regular expressions
Xregex
Patterns
Word equations
Decision problems
Descriptional complexity
Issue Date: 2017
Publisher: Springer Verlag / © The Authors
Citation: FREYDENBERGER, D.D. and HOLLDACK, M., 2017. Document spanners: From expressive power to decision problems. Theory of Computing Systems, DOI: 10.1007/s00224-017-9770-0
Abstract: We examine document spanners, a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models {namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.
Description: This is an open access article published by Springer and distributed under the terms of the Creative Commons Attribution Licence, https://creativecommons.org/licenses/by/4.0/
Version: Published
DOI: 10.1007/s00224-017-9770-0
URI: https://dspace.lboro.ac.uk/2134/24768
Publisher Link: http://dx.doi.org/10.1007/s00224-017-9770-0
ISSN: 1433-0490
Appears in Collections:Published Articles (Computer Science)

Files associated with this item:

File Description SizeFormat
document spanners.pdfPublished version866.74 kBAdobe PDFView/Open

 

SFX Query

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.