Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 263171
Loughborough University

Loughborough University Institutional Repository

Please use this identifier to cite or link to this item: https://dspace.lboro.ac.uk/2134/24360

Title: Document spanners: From expressive power to decision problems
Authors: Freydenberger, Dominik D.
Holldack, Mario
Keywords: Information extraction
Document spanners
Regular expressions
Regex
Patterns
Word equations
Decision problems
Descriptional complexity
Issue Date: 2016
Publisher: Schloss Dagstuhl – Leibniz Center for Informatics
Citation: FREYDENBERGER, D.D. and HOLLDACK, M., 2016. Document spanners: From expressive power to decision problems. Presented at the 19th International Conference on Database Theory (ICDT 2016), Bordeaux, France, Mar 15-18th.
Series/Report no.: Leibniz International Proceedings in Informatics, (LIPIcs);48
Abstract: © 2016 Dominik D. Freydenberger and Mario Holldack.We examine document spanners, a formal framework for information extraction that was introduced by Fagin et al. (PODS 2013). A document spanner is a function that maps an input string to a relation over spans (intervals of positions of the string). We focus on document spanners that are defined by regex formulas, which are basically regular expressions that map matched subexpressions to corresponding spans, and on core spanners, which extend the former by standard algebraic operators and string equality selection. First, we compare the expressive power of core spanners to three models - namely, patterns, word equations, and a rich and natural subclass of extended regular expressions (regular expressions with a repetition operator). These results are then used to analyze the complexity of query evaluation and various aspects of static analysis of core spanners. Finally, we examine the relative succinctness of different kinds of representations of core spanners and relate this to the simplification of core spanners that are extended with difference operators.
Description: This is an Open Access Article. It is published by Schloss Dagstuhl under the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/
Version: Published
DOI: 10.4230/LIPIcs.ICDT.2016.17
URI: https://dspace.lboro.ac.uk/2134/24360
Publisher Link: http://dx.doi.org/10.4230/LIPIcs.ICDT.2016.17
ISBN: 9783959770026
ISSN: 1868-8969
Appears in Collections:Conference Papers and Presentations (Computer Science)

Files associated with this item:

File Description SizeFormat
16.pdfPublished version577.89 kBAdobe PDFView/Open

 

SFX Query

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.