Automatic scheduling of Halide programs
The Halide image processing language has proven to
be an effective system for authoring high-performance image processing code. Halide
programmers need only provide a high-level strategy for mapping an image processing
pipeline to a parallel machine (a schedule), and the Halide compiler carries out the
mechanical task of generating platform-specific code that implements the schedule.
Unfortunately, designing high-performance schedules for complex image processing pipelines
requires substantial knowledge of modern hardware architecture and code-optimization techniques.
In this paper we provide an algorithm for automatically generating high-performance
schedules for Halide programs. Our solution extends the function bounds analysis
already present in the Halide compiler to automatically perform locality and
parallelism-enhancing global program transformations typical of those employed
by expert Halide developers. The algorithm does not require costly (and often impractical)
auto-tuning, and, in seconds, generates schedules for a broad set of image processing
benchmarks that are performance-competitive with, and often better than, schedules
manually authored by expert Halide developers on server and mobile CPUs, as well as GPUs.
Automatically Scheduling Halide Image Processing Pipelines
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, Kayvon Fatahalian
Automatic Optimization for Image Processing Pipelines
Image processing pipelines are ubiquitous and demand
high-performance implementations on modern architectures.
Manually implementing high performance pipelines is tedious,
error prone and not portable. For my masters thesis, I focused
on the problem of automatically generating efficient multi-core
implementations of image processing pipelines from a high-level
description of the pipeline algorithm. I leveraged polyhedral
representation and code generation techniques to achieve this
PolyMage is a domain-specific system built for
evaluating and experimenting with techniques developed during
the course of my masters.
PolyMage: Automatic Optimization for Image Processing Pipelines
Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
Architectural Support for Programming Languages and
Operating Systems (ASPLOS),
Compiling Affine Loop Nests for Dataflow Runtimes
Designed and evaluated a compiler and runtime to automatically
extract coarse-grained dataflow parallelism in affine loop nests
to target shared and distributed memory systems. As part of the
evaluation, we implemented a set of benchmarks using the CnC
(Intel Concurrent Collections) programming model to serve as a
comparision to our system. Implementation of the Floyd-Warshall
All-Pairs-Shortest-Paths algorithm used in the evaluation is now
part of Intel CnC samples.