Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 263171
Loughborough University

Loughborough University Institutional Repository

Please use this identifier to cite or link to this item: https://dspace.lboro.ac.uk/2134/24363

Title: A logic for document spanners
Authors: Freydenberger, Dominik D.
Keywords: Information extraction
Document spanners
Word equations
Regex
Descriptional complexity
Issue Date: 2017
Publisher: Schloss Dagstuhl – Leibniz Center for Informatics
Citation: FREYDENBERGER, D.D., 2017. A Logic for Document Spanners. Presented at the International Conference on Database Theory (ICDT 2017), Venice, Italy, Mar 21-24th.
Abstract: Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by Freydenberger and Holldack (ICDT 2016), there is a connection between core spanners and ECreg, the existential theory of concatenation with regular constraints. The present paper further develops this connection by defining SpLog, a fragment of ECreg that has the same expressive power as core spanners. This equivalence extends beyond equivalence of expressive power, as we show the existence of polynomial time conversions between this fragment and core spanners. This even holds for variants of core spanners that are based on automata instead of regular expressions. Applications of this approach include an alternative way of defining relations for spanners, insights into the relative succinctness of various classes of spanner representations, and a pumping lemma for core spanners.
Description: This is an Open Access Article. It is published by Schloss Dagstuhl under the Creative Commons Attribution 4.0 Unported Licence (CC BY). Full details of this licence are available at: http://creativecommons.org/licenses/by/4.0/
Sponsor: This research was supported by Deutsche Forschungsgemeinschaft (DFG) under grant FR 3551/1-1.
Version: Published
DOI: 10.4230/LIPIcs.CVIT.2016.23
URI: https://dspace.lboro.ac.uk/2134/24363
Publisher Link: http://dx.doi.org/10.4230/LIPIcs.CVIT.2016.23
http://edbticdt2017.unive.it/?main
Appears in Collections:Conference Papers and Presentations (Computer Science)

Files associated with this item:

File Description SizeFormat
Freydenberger.pdfPublished version609.67 kBAdobe PDFView/Open

 

SFX Query

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.