Loughborough University
Leicestershire, UK
LE11 3TU
+44 (0)1509 263171
Loughborough University

Loughborough University Institutional Repository

Please use this identifier to cite or link to this item: https://dspace.lboro.ac.uk/2134/21761

Title: An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor
Authors: Parker, Samuel J.
Chouliaras, V.A.
Keywords: OpenCL
FPGA
Heterogeneous computing
Multi-core
Compilation
Issue Date: 2016
Publisher: © Elsevier
Citation: PARKER, S.J. and CHOULIARAS, V.A., 2016. An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor. Journal of Systems Architecture, 68, pp. 17-37.
Abstract: Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8x (using 2 dual-issue cores), up to 5.2x (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4x (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.
Description: This paper is embargoed until December 2017.
Version: Accepted for publication
DOI: 10.1016/j.sysarc.2016.06.003
URI: https://dspace.lboro.ac.uk/2134/21761
Publisher Link: http://dx.doi.org/10.1016/j.sysarc.2016.06.003
ISSN: 1383-7621
Appears in Collections:Closed Access (Mechanical, Electrical and Manufacturing Engineering)

Files associated with this item:

File Description SizeFormat
opencl-framework.R3.pdfAccepted version3.19 MBAdobe PDFView/Open

 

SFX Query

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.