This thesis by publications addresses issues in the architecture and microarchitecture of next
generation, high performance streaming Systems-on-Chip through quantifying the most
important forms of parallelism in current and emerging embedded system workloads.
The work consists of three major research tracks, relating to data level parallelism, thread
level parallelism and the software-hardware interface which together reflect the research
interests of the author as they have been formed in the last nine years.
Published works confirm that parallelism at the data level is widely accepted as the most
important performance leverage for the efficient execution of embedded media and telecom
applications and has been exploited via a number of approaches the most efficient being
vectorlSIMD architectures. A further, complementary and substantial form of parallelism
exists at the thread level but this has not been researched to the same extent in the context of
embedded workloads. For the efficient execution of such applications, exploitation of both
forms of parallelism is of paramount importance. This calls for a new architectural approach
in the software-hardware interface as its rigidity, manifested in all desktop-based and the
majority of embedded CPU's, directly affects the performance ofvectorized, threaded codes.
The author advocates a holistic, mature approach where parallelism is extracted via automatic
means while at the same time, the traditionally rigid hardware-software interface is optimized
to match the temporal and spatial behaviour of the embedded workload. This ultimate goal
calls for the precise study of these forms of parallelism for a number of applications executing
on theoretical models such as instruction set simulators and parallel RAM machines as well
as the development of highly parametric microarchitectural frameworks to encapSUlate that
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy of Loughborough University.